UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.19k stars 2.47k forks source link

Unexpected ONNX inference speed #1145

Open JulesBelveze opened 3 years ago

JulesBelveze commented 3 years ago

Hey people,

First of all thanks for the nice library 😺

I'm trying to export a SentenceTransformer to ONNX and I came across this nice tutorial from an open PR (#668) and the exportation went smoothly !

However, I am experiencing some weird performance results on CPU. When the input is small, let's say 10 tokens, then everything goes well and the ONNX model is faster than its PyTorch version (~3-4 times faster). But, when using a big input (bigger than max length) then the ONNX version gets slower than the PyTorch one (~3 times slower).

I have double checked and both tokenizers are identical and the input is properly truncated.. I'm kindda running out of ideas now.. Does anyone have a suggestion?

Cheers, Jules

ArEnSc commented 2 years ago

I think the tokenizer padding could be wrong I believe the size of the tokens have to be a power of 2 for gpu