Open cmdllx opened 4 years ago
longformer should take less time than bert for long-document
Looks like your sequence length is short enough that it fits within the 512 limit of BERT. In this case, yes, a standard n^2 selfattention is faster. With longer sequence lengths, longformer selfattention becomes much faster than n^2.
when I use longformer in huggingface/transformers ,I find that the inference time of longformer-base-4096 is more than bert-base-cased. However, the longformer should take less time than bert for long-document