longformer infer speed?

allenai / longformer

Longformer: The Long-Document Transformer

https://arxiv.org/abs/2004.05150

Apache License 2.0

2.05k stars 276 forks source link

longformer infer speed? #171

Open lookmyeye opened 3 years ago

lookmyeye commented 3 years ago

with model.half() can only got 10qbs per 2080ti with sequence length 1024 (12 layers)

did I miss something?

I try to infer with onnxruntime, but got an ScatterND error while session run

ibeltagy commented 3 years ago

Can you try to change the model config to attention_mode = 'sliding_chunks_no_overlap' attention_window = 170 The sliding_chunks_no_overlap implementation (here) is generally faster, but also fast for relatively short sequences, and it is simpler so might be easier to get it to work on onnx.

lookmyeye commented 3 years ago

Thank for your advice

I haven't found any mentioned infer speed.

If someone could infer with a faster speed than 10qbs, please let me known. (attention_mode = 'sliding_chunks')

liyaxin999 commented 2 years ago

@lookmyeye Hi, I have the exactly the same issue when I tried to run the model with onnxruntime, did you solve the problem？