UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.02k stars 2.45k forks source link

output inconsistency when convert to onnx #1792

Open GodsDusk opened 1 year ago

GodsDusk commented 1 year ago

package version:

and here is a simple example:

import os
import onnxruntime
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer

# do convert
model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
model.save('test_model')
os.system('python -m transformers.onnx --model=test_model test_onnx/')
ort_session = onnxruntime.InferenceSession('test_onnx/model.onnx')

# do compare
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
test_sentence1 = 'The test sentence'
onnx_input1 = tokenizer(test_sentence1, padding=True, truncation=True, return_tensors='np')
onnx_output1 = ort_session.run(['1550'], dict(onnx_input1))[0][0]
torch_output1 = model.encode(test_sentence1)

# compare two output embeddings
print(onnx_output1[:10])
print(torch_output1[:10])

# compare two similarity
test_sentence2 = 'Another test sentence'
onnx_input2 = tokenizer(test_sentence2, padding=True, truncation=True, return_tensors='np')
onnx_output2 = ort_session.run(['1550'], dict(onnx_input2))[0][0]
torch_output2 = model.encode(test_sentence2)
print(util.pytorch_cos_sim(onnx_output1, onnx_output2))
print(util.pytorch_cos_sim(torch_output1, torch_output2))

the console output:

Local PyTorch model found.
Framework not requested. Using torch to export to ONNX.
Using framework PyTorch: 1.13.0
Overriding 1 configuration item(s)
    - use_cache -> False
Validating ONNX model...
    -[✓] ONNX model output names match reference model ({'last_hidden_state'})
    - Validating ONNX Model output "last_hidden_state":
        -[✓] (3, 9, 768) matches (3, 9, 768)
        -[✓] all values close (atol: 1e-05)
All good, model saved at: test_onnx/model.onnx
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[-0.02717968  0.06510455  0.09243378 -0.01974958 -0.07005666  0.04164138
 -0.03060718  0.01758665 -0.00161789 -0.10091655]
[ 0.02616484 -0.21018569 -0.01227845 -0.07859905 -0.18024477  0.02220263
  0.08576109  0.08979137  0.08766035  0.05595617]
tensor([[0.9587]])
tensor([[0.9339]])
GodsDusk commented 1 year ago

And I use netron to get the output layer name 1550

WX20221222-095837