I'm trying to export a SentenceTransformer to ONNX and I came across this nice tutorial from an open PR (#668) and the exportation went smoothly !
However, I am experiencing some weird performance results on CPU.
When the input is small, let's say 10 tokens, then everything goes well and the ONNX model is faster than its PyTorch version (~3-4 times faster). But, when using a big input (bigger than max length) then the ONNX version gets slower than the PyTorch one (~3 times slower).
I have double checked and both tokenizers are identical and the input is properly truncated.. I'm kindda running out of ideas now.. Does anyone have a suggestion?
Hey people,
First of all thanks for the nice library 😺
I'm trying to export a
SentenceTransformer
to ONNX and I came across this nice tutorial from an open PR (#668) and the exportation went smoothly !However, I am experiencing some weird performance results on CPU. When the input is small, let's say 10 tokens, then everything goes well and the ONNX model is faster than its PyTorch version (~3-4 times faster). But, when using a big input (bigger than max length) then the ONNX version gets slower than the PyTorch one (~3 times slower).
I have double checked and both tokenizers are identical and the input is properly truncated.. I'm kindda running out of ideas now.. Does anyone have a suggestion?
Cheers, Jules