Open lookmyeye opened 3 years ago
Can you try to change the model config to
attention_mode = 'sliding_chunks_no_overlap'
attention_window = 170
The sliding_chunks_no_overlap
implementation (here) is generally faster, but also fast for relatively short sequences, and it is simpler so might be easier to get it to work on onnx.
Thank for your advice
I haven't found any mentioned infer speed.
If someone could infer with a faster speed than 10qbs, please let me known. (attention_mode = 'sliding_chunks')
@lookmyeye Hi, I have the exactly the same issue when I tried to run the model with onnxruntime, did you solve the problem?
with model.half() can only got 10qbs per 2080ti with sequence length 1024 (12 layers)
did I miss something?
I try to infer with onnxruntime, but got an ScatterND error while session run