allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

problem about longformer inference of huggingface/transformers #106

Open cmdllx opened 4 years ago

cmdllx commented 4 years ago

when I use longformer in huggingface/transformers ,I find that the inference time of longformer-base-4096 is more than bert-base-cased. However, the longformer should take less time than bert for long-document

ibeltagy commented 4 years ago

longformer should take less time than bert for long-document

Looks like your sequence length is short enough that it fits within the 512 limit of BERT. In this case, yes, a standard n^2 selfattention is faster. With longer sequence lengths, longformer selfattention becomes much faster than n^2.