Unexpected ONNX inference speed

Hey people,

First of all thanks for the nice library 😺

I'm trying to export a SentenceTransformer to ONNX and I came across this nice tutorial from an open PR (#668) and the exportation went smoothly !

However, I am experiencing some weird performance results on CPU. When the input is small, let's say 10 tokens, then everything goes well and the ONNX model is faster than its PyTorch version (~3-4 times faster). But, when using a big input (bigger than max length) then the ONNX version gets slower than the PyTorch one (~3 times slower).

I have double checked and both tokenizers are identical and the input is properly truncated.. I'm kindda running out of ideas now.. Does anyone have a suggestion?

Cheers, Jules

UKPLab / sentence-transformers

Unexpected ONNX inference speed #1145