Open amandaortega opened 2 years ago
@amandaortega I found that tokens = tokenizer(sentences_batch, padding=True, truncation=True, return_tensors='pt') doesn't padd at all. I am also expericing that the hidden layer outputs of the onnx model and the sentence transformer model do not match given the same sentence did you experience the same thing?
Hi.
I am trying to run my SBERT model through ONNX to speed up the performance at inference. I have successfully converted the model to .onnx extension. For only one string or a list of strings of the same length, the sentence embeddings generated by ONNX match exactly the embeddings generated by my original SBERT model. However, for a list of variable length strings, not all the positions of the embeddings generated by ONNX match the ones generated by the original model.
I have run both the original and the ONNX models in batches. To run ONNX model with strings of different lengths, I used the
padding
option of the tokenizer. After the ONNX model returns the output, I calculated the correct sentence embeddings by considering the attention mask the tokenizer returned, just as recommended at https://www.sbert.net/examples/applications/computing-embeddings/README.html.As I said, when I run this code with only one string or a list of strings of the same length as input, the embeddings match. However, when I run with a list of variable length strings, they don't match, which makes me think the problem is with the padding strategy.
Does anyone have any suggestions on what could cause the problem?
Thanks a lot in advance!