allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

infer speed longformer vs bert #183

Open SuMeng123 opened 3 years ago

SuMeng123 commented 3 years ago

I used bert-base-cased as the basic model and retrained my long-bert-512 .

long-bert settings: attention windows (each layer is the same): 16 max_pos: 512

My sequence length is long enough. I think O(16*512) window attention should be faster than O(512*512) self-attention , however, I find that the inference time of long-bert-512 is more than bert-base-cased.

did I miss something?