Longformer inference speed is slower than bert of the same length

chenlin038 commented 3 years ago

I used bert-base as the basic model and retrained my long-bert with a length of 1536. Then I compared the difference in inference speed of the original bert-base-1536. After a lot of testing, I found that long-bert-1536 and bert-base-1536 are basically the same in inference speed. I see a similar problem #106 , but the length of my test data is all greater than 1000. I think window attention should be faster than self-attention because the amount of calculation is smaller, but why does this problem occur? Here are some settings: attention windows (each layer is the same): 512 Global attention: only used for cls token Inference device: cpu task: text classification

By the way, does the size of the attention window affect the speed of inference? I tested different window sizes, but the speed is basically the same.

SCHENLIU commented 3 years ago

Did you use tvm？

chenlin038 commented 3 years ago

@SCHENLIU I did not use tvm. In the paper experiment, tvm is only used for autoregressive language modeling.

allenai / longformer

Longformer inference speed is slower than bert of the same length #139