Closed rachel2011 closed 3 weeks ago
Sadly I never worked with onnx.
In SentenceTransformer, the forward function takes in one argument: features (and the second in python is self).
Features is a dictionary, that contains the different features, for example, token ids, word weights, attention values, token_type_ids.
For the BERT model, I think your input must look like this:
input_features = {'input_ids': dummy_input0, 'token_type_ids': dummy_input1, 'input_mask': dummy_input2}
And then:
torch.onnx.export(model,input_features, onnx_file_name, verbose=True)
@rachel2011 Did you get any solutions to successfully convert sentence bert model to onnx format?
I'm also wondering how to feed text inputs into a converted onnx model. Could we do something similar as
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
sentence_embeddings = model.encode(sentences)
by replacing the model to the converted onnx model? Any ideas?
Hi @ycgui I started to add the models to HuggingFace Models Hub: https://huggingface.co/sentence-transformers
Huggingface also provides methods / scripts to convert models to ONNX.
I hope this helps.
Thanks @nreimers. This is awesome!
Hey @ycgui I would be really thankful if you could share the code you used to convert the models to ONNX and then how you can encode sentences using that model. Thank you in advance
Hey @ycgui I would be really thankful if you could share the code you used to convert the models to ONNX and then how you can encode sentences using that model. Thank you in advance
As per my understanding shouldn't the pooled
output after running the ONNX model match the output of encode
using SentenceTransformers?
This doesn't seem to be the case in my testing. (using the same model and tokenizer in both cases)
@nreimers , I would appreciate your help greatly
Sadly I am not aware of ONNX format.
Here you can see an example how to load the models with native transformers code and how to apply mean pooling correctly (watch out for padding tokens): https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens
Thank you a lot @nreimers , applying the mean pooling correctly I was able to get sentence embeddings as expected!
I am thinking about making a small tutorial Notebook on how to use sentence transformers with ONNX + benchmarking it against the traditional PyTorch model, would that be useful in Examples?
Yes, it would be great to have such tutorial.
Awesome, I'll make a PR regarding that soon :+1: Thanks for the awesome library!
Created a PR regarding this: https://github.com/UKPLab/sentence-transformers/pull/386
great, I will have a look
@rachel2011: My response might be a bit late. I think the keys in your dictionary are wrong. For sentence-transformers in version 0.3.7.2, I downloaded the models (like bert-base-nli-mean-tokens) from here. Then I used
input_features = {'input_ids': input_ids, 'token_type_ids': input_type_ids, 'attention_mask': input_mask}
and
torch.onnx.export(model,input_features, onnx_file_name, verbose=True)
to export the sentence bert model.
@cantwbr A simpler way is to use from transformers.convert_graph_to_onnx import convert
that can be used to convert to an ONNX model. Refer to this PR.
@rachel2011: My response might be a bit late. I think the keys in your dictionary are wrong. For sentence-transformers in version 0.3.7.2, I downloaded the models (like bert-base-nli-mean-tokens) from here. Then I used
input_features = {'input_ids': input_ids, 'token_type_ids': input_type_ids, 'attention_mask': input_mask}
andtorch.onnx.export(model,input_features, onnx_file_name, verbose=True)
to export the sentence bert model.
hello,how do you define the input_ids、input_type_ids and input_mask,can you show your demo? My code as shown below does not pass。
batch_size=1 max_seq_length=128 device = torch.device("cuda") model.to(device) dummy_input0 = torch.LongTensor(batch_size, max_seq_length).to(device) dummy_input1 = torch.LongTensor(batch_size, max_seq_length).to(device) dummy_input2 = torch.LongTensor(batch_size, max_seq_length).to(device) input_features = {'input_ids': dummy_input0, 'token_type_ids': dummy_input1, 'attention_mask': dummy_input2}
@cantwbr Looking forward to your reply,thanks。
@codingliuyg: I didn't use LongTensors. Instead I generated ones-tensors.
input_ids = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device)
input_type_ids = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device)
input_mask = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device)
input_features = {'input_ids': input_ids, 'token_type_ids': input_type_ids, 'attention_mask': input_mask}
torch.onnx.export(model,input_features, onnx_file_name, verbose=True)
@codingliuyg: I didn't use LongTensors. Instead I generated ones-tensors.
input_ids = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device) input_type_ids = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device) input_mask = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device) input_features = {'input_ids': input_ids, 'token_type_ids': input_type_ids, 'attention_mask': input_mask} torch.onnx.export(model,input_features, onnx_file_name, verbose=True)
@cantwbr thank you for your replay.After changing my code as following,A error occurs。Do you know what happened ?Is there anything wrong with my code? thank you .
_model = SentenceTransformer('roberta-base-nli-stsb-mean-tokens',device='cpu') batch_size=1 max_seq_length=128 device = torch.device("cpu") model.to(device) input_ids = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device) input_type_ids = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device) input_mask = torch.ones(batch_size, max_seq_length, dtype=torch.long).to(device) input_features = {'input_ids': input_ids, 'token_type_ids': input_type_ids, 'attention_mask': input_mask} onnx_path = "onnx_model_name.onnx" torch.onnx.export(model, input_features, onnx_path)_
error:
2020-12-17 21:24:17 - Load pretrained SentenceTransformer: roberta-base-nli-stsb-mean-tokens
2020-12-17 21:24:17 - Load SentenceTransformer from folder: roberta-base-nli-stsb-mean-tokens
Traceback (most recent call last):
File "embedding_reduce_cp.py", line 45, in
@codingliuyg: I tried exporting roberta-base-nli-stsb-mean-tokens and got a similar error. I resolved it by changing torch.ones
to torch.zeros
.
I used torch 1.4.0, sentence-transformers 0.3.7.2 in Python 3.8.
@cantwbr can you share the code that you used for inference?
@shar999: For inference using the ONNX file, I reuse the featurizer of the original SentenceBert package from
from sentence_transformers.datasets import EncodeDataset
import onnxruntime as ort
from sentence_transformers import SentenceTransformer
from torch.utils.data import DataLoader
The inference:
model = SentenceTransformer(model_name_or_path)
fused_bert_session = ort.InferenceSession(onnx_file_name)
batch_size = 1
is_pretokenized = False
num_workers = 0
pad_to = 128
all_embeddings = []
length_sorted_idx = np.argsort([model._text_length(sen) for sen in sentences])
sentences_sorted = [sentences[idx] for idx in length_sorted_idx]
inp_dataset = EncodeDataset(sentences_sorted, model=model, is_tokenized=is_pretokenized)
inp_dataloader = DataLoader(inp_dataset, batch_size=batch_size, collate_fn=model.smart_batching_collate_text_only,
num_workers=num_workers, shuffle=False)
iterator = inp_dataloader
for features in iterator:
for feature_name in features:
pad_amt = features[feature_name].size(-1)
if pad_amt != 0:
features[feature_name] = nn.functional.pad(features[feature_name], (0, pad_to - pad_amt), value=0)
e_input_ids = features["input_ids"].cpu().numpy()
e_input_type_ids = features["token_type_ids"].cpu().numpy()
e_input_mask = features["attention_mask"].cpu().numpy()
results = fused_bert_session.run(None,{"input_ids": e_input_ids, "token_type_ids": e_input_type_ids, "attention_mask": e_input_mask,})
# results[5] contains the sentence_embeddings
all_embeddings.append(results[5])
The embeddings are located in all_embeddings.
i have the same problem to speed up encode of sentence transformer . so I convert sentence-transformer model to onnx model and tensorrt model. about 4 times faster
you can use my tutorial: quick_sentence_transformers
this tutorial show how convert sentencetransformer model to onnx and plan file
i have the same problem to speed up encode of sentence transformer . so I convert sentence-transformer model to onnx model and tensorrt model. about 4 times faster
you can use my tutorial: quick_sentence_transformers
this tutorial show how convert sentencetransformer model to onnx and plan file
Thanks , It really helped me understand the conversion process.
Hi,
I had some trouble converting the sentence-transformers/all-mpnet-base-v2
model with onnx format so I'll share with you a class and a function that I have made with @yuanzhoulvpi2017 tutorial (it was helpful, thank you).
I've done some tests and I tend to measure a 4x speedup using onnx format. I'm not sure my code is fully optimised.
import torch
import transformers
from sentence_transformers import SentenceTransformer, models
class OnnxEncoder:
"""OnxEncoder dedicated to run SentenceTransformer under OnnxRuntime."""
def __init__(self, session, tokenizer, pooling, normalization):
self.session = session
self.tokenizer = tokenizer
self.max_length = tokenizer.__dict__["model_max_length"]
self.pooling = pooling
self.normalization = normalization
def encode(self, sentences: list):
sentences = [sentences] if isinstance(sentences, str) else sentences
inputs = {
k: v.numpy()
for k, v in self.tokenizer(
sentences,
padding=True,
truncation=True,
max_length=self.max_length,
return_tensors="pt",
).items()
}
hidden_state = self.session.run(None, inputs)
sentence_embedding = self.pooling.forward(
features={
"token_embeddings": torch.Tensor(hidden_state[0]),
"attention_mask": torch.Tensor(inputs.get("attention_mask")),
},
)
if self.normalization is not None:
sentence_embedding = self.normalization.forward(features=sentence_embedding)
sentence_embedding = sentence_embedding["sentence_embedding"]
if sentence_embedding.shape[0] == 1:
sentence_embedding = sentence_embedding[0]
return sentence_embedding.numpy()
def sentence_transformers_onnx(
model,
path,
do_lower_case=True,
input_names=["input_ids", "attention_mask", "segment_ids"],
providers=["CPUExecutionProvider"],
):
"""OnxRuntime for sentence transformers.
Parameters
----------
model
SentenceTransformer model.
path
Model file dedicated to session inference.
do_lower_case
Either or not the model is cased.
input_names
Fields needed by the Transformer.
providers
Either run the model on CPU or GPU: ["CPUExecutionProvider", "CUDAExecutionProvider"].
"""
try:
import onnxruntime
except:
raise ValueError("You need to install onnxruntime.")
model.save(path)
configuration = transformers.AutoConfig.from_pretrained(
path, from_tf=False, local_files_only=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(
path, do_lower_case=do_lower_case, from_tf=False, local_files_only=True
)
encoder = transformers.AutoModel.from_pretrained(
path, from_tf=False, config=configuration, local_files_only=True
)
st = ["cherche"]
inputs = tokenizer(
st,
padding=True,
truncation=True,
max_length=tokenizer.__dict__["model_max_length"],
return_tensors="pt",
)
model.eval()
with torch.no_grad():
symbolic_names = {0: "batch_size", 1: "max_seq_len"}
torch.onnx.export(
encoder,
args=tuple(inputs.values()),
f=f"{path}.onx",
opset_version=13, # ONX version needs to be >= 13 for sentence transformers.
do_constant_folding=True,
input_names=input_names,
output_names=["start", "end"],
dynamic_axes={
"input_ids": symbolic_names,
"attention_mask": symbolic_names,
"segment_ids": symbolic_names,
"start": symbolic_names,
"end": symbolic_names,
},
)
normalization = None
for modules in model.modules():
for idx, module in enumerate(modules):
if idx == 1:
pooling = module
if idx == 2:
normalization = module
break
return OnnxEncoder(
session=onnxruntime.InferenceSession(f"{path}.onx", providers=providers),
tokenizer=tokenizer,
pooling=pooling,
normalization=normalization,
)
The sentence_transformers_onx
function returns a model with a method encode that behave like SentenceTransformers models.
model = sentence_transformers_onnx(
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2"),
path = "onnx_model",
)
Raphaël
Hi,
I had some trouble converting the
sentence-transformers/all-mpnet-base-v2
model with onnx format so I'll share with you a class and a function that I have made with @yuanzhoulvpi2017 tutorial (it was helpful, thank you).I've done some tests and I tend to measure a 4x speedup using onnx format. I'm not sure my code is fully optimised.
import torch import transformers from sentence_transformers import SentenceTransformer, models class OnnxEncoder: """OnxEncoder dedicated to run SentenceTransformer under OnnxRuntime.""" def __init__(self, session, tokenizer, pooling, normalization): self.session = session self.tokenizer = tokenizer self.max_length = tokenizer.__dict__["model_max_length"] self.pooling = pooling self.normalization = normalization def encode(self, sentences: list): sentences = [sentences] if isinstance(sentences, str) else sentences inputs = { k: v.numpy() for k, v in self.tokenizer( sentences, padding=True, truncation=True, max_length=self.max_length, return_tensors="pt", ).items() } hidden_state = self.session.run(None, inputs) sentence_embedding = self.pooling.forward( features={ "token_embeddings": torch.Tensor(hidden_state[0]), "attention_mask": torch.Tensor(inputs.get("attention_mask")), }, ) if self.normalization is not None: sentence_embedding = self.normalization.forward(features=sentence_embedding) sentence_embedding = sentence_embedding["sentence_embedding"] if sentence_embedding.shape[0] == 1: sentence_embedding = sentence_embedding[0] return sentence_embedding.numpy() def sentence_transformers_onnx( model, path, do_lower_case=True, input_names=["input_ids", "attention_mask", "segment_ids"], providers=["CPUExecutionProvider"], ): """OnxRuntime for sentence transformers. Parameters ---------- model SentenceTransformer model. path Model file dedicated to session inference. do_lower_case Either or not the model is cased. input_names Fields needed by the Transformer. providers Either run the model on CPU or GPU: ["CPUExecutionProvider", "CUDAExecutionProvider"]. """ try: import onnxruntime except: raise ValueError("You need to install onnxruntime.") model.save(path) configuration = transformers.AutoConfig.from_pretrained( path, from_tf=False, local_files_only=True ) tokenizer = transformers.AutoTokenizer.from_pretrained( path, do_lower_case=do_lower_case, from_tf=False, local_files_only=True ) encoder = transformers.AutoModel.from_pretrained( path, from_tf=False, config=configuration, local_files_only=True ) st = ["cherche"] inputs = tokenizer( st, padding=True, truncation=True, max_length=tokenizer.__dict__["model_max_length"], return_tensors="pt", ) model.eval() with torch.no_grad(): symbolic_names = {0: "batch_size", 1: "max_seq_len"} torch.onnx.export( encoder, args=tuple(inputs.values()), f=f"{path}.onx", opset_version=13, # ONX version needs to be >= 13 for sentence transformers. do_constant_folding=True, input_names=input_names, output_names=["start", "end"], dynamic_axes={ "input_ids": symbolic_names, "attention_mask": symbolic_names, "segment_ids": symbolic_names, "start": symbolic_names, "end": symbolic_names, }, ) normalization = None for modules in model.modules(): for idx, module in enumerate(modules): if idx == 1: pooling = module if idx == 2: normalization = module break return OnnxEncoder( session=onnxruntime.InferenceSession(f"{path}.onx", providers=providers), tokenizer=tokenizer, pooling=pooling, normalization=normalization, )
The
sentence_transformers_onx
function returns a model with a method encode that behave like SentenceTransformers models.model = sentence_transformers_onnx( model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2"), path = "onnx_model", )
Raphaël
Thanks for nice work. On the other hand, it took longer using onnx
than normal SentenceTransformer
under GPU. Any thoughts?
It may be due to the pooling operation that we keep in plain Pytorch ?
Hi, I use SentenceTransformer as ONNX model this way - may be useful for someone:
import os
from pathlib import Path
from dataclasses import dataclass
from typing import Optional, Union, Mapping, OrderedDict
import torch
from transformers.onnx import export
from transformers.onnx import OnnxConfig
from transformers.utils import ModelOutput
from sentence_transformers.models import Dense
from transformers import AutoTokenizer, AutoModel, DistilBertModel
# get with SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v2', cache_folder=".")
model_ckpt = "./sentence-transformers_distiluse-base-multilingual-cased-v2"
class SBertOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict([
("input_ids", {0: "batch", 1: "sequence"}),
("attention_mask", {0: "batch", 1: "sequence"})
])
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict([
("last_hidden_state", {0: "batch", 1: "sequence"})
])
@dataclass
class EmbeddingOutput(ModelOutput):
last_hidden_state: Optional[torch.FloatTensor] = None
class OwnSBert(DistilBertModel):
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.PathLike]], *model_args, **kwargs):
_model = super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
additional_layer = Dense.load(kwargs.get("path_to_additional_layer"))
_model.additional_layer_linear = additional_layer.linear
_model.additional_layer_activation = additional_layer.activation_function
return _model
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
inputs_embeds: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
):
embeddings = super().forward(input_ids=input_ids,
attention_mask=attention_mask,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
output_attentions=True,
output_hidden_states=True,
return_dict=True)
mean_embedding = embeddings.last_hidden_state.mean(dim=1)
last_hidden_state = self.additional_layer_activation(self.additional_layer_linear(mean_embedding))
return EmbeddingOutput(last_hidden_state=last_hidden_state)
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
base_model = OwnSBert.from_pretrained(model_ckpt, path_to_additional_layer="./sentence-transformers_distiluse-base-multilingual-cased-v2/2_Dense")
# print(base_model(**tokenizer([sentences[0], sentences[1]], padding="longest", truncation=True, return_tensors="pt")))
onnx_path = Path("exported_model/model.onnx")
onnx_config = SBertOnnxConfig.from_model_config(base_model.config)
onnx_inputs, onnx_outputs = export(tokenizer, base_model, onnx_config, onnx_config.default_onnx_opset, onnx_path)
base_model.config.save_pretrained("./exported_model/")
and then when I compare output of original implementation and loaded ONNX model it is same.
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForFeatureExtraction
sentences = ["This is an example sentence", "Each sentence is converted"]
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/distiluse-base-multilingual-cased-v2")
onnx_model = ORTModelForFeatureExtraction.from_pretrained("exported_model/")
inputs_2 = tokenizer([sentences[0], sentences[1]], padding="longest", truncation=True, return_tensors="pt")
outputs_2 = onnx_model(**inputs_2)
print(outputs_2)
# BaseModelOutput(last_hidden_state=tensor([[-0.0348, 0.0264, -0.0443, ...,
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v2', cache_folder=".")
embeddings = model.encode(sentences)
print(embeddings)
# [[-0.03479306 0.02635195 -0.04427201 ...
There is some rounding behavior which change the output of ONNX model to lower precision but this is not happening when running on server (NVIDIA Triton) so I assume it is somehow related to Jupyter Notebook where I was doing the experiments.
The reason why I override from_pretrained
function is that I was not able to load proper weights in __init__
of the DistilBertModel
and the weights were overwritten each time with random values - but this hack use the right values. Then to run it on server (NVIDIA Trition) I am using ensemble of models and join it with tokenizer on the server (but it is not fully related to exporting)
model_ckpt
I get a
TypeError: __init__() got an unexpected keyword argument 'path_to_additional_layer'
error in that case
Hi, I had some trouble converting the
sentence-transformers/all-mpnet-base-v2
model with onnx format so I'll share with you a class and a function that I have made with @yuanzhoulvpi2017 tutorial (it was helpful, thank you). I've done some tests and I tend to measure a 4x speedup using onnx format. I'm not sure my code is fully optimised.import torch import transformers from sentence_transformers import SentenceTransformer, models class OnnxEncoder: """OnxEncoder dedicated to run SentenceTransformer under OnnxRuntime.""" def __init__(self, session, tokenizer, pooling, normalization): self.session = session self.tokenizer = tokenizer self.max_length = tokenizer.__dict__["model_max_length"] self.pooling = pooling self.normalization = normalization def encode(self, sentences: list): sentences = [sentences] if isinstance(sentences, str) else sentences inputs = { k: v.numpy() for k, v in self.tokenizer( sentences, padding=True, truncation=True, max_length=self.max_length, return_tensors="pt", ).items() } hidden_state = self.session.run(None, inputs) sentence_embedding = self.pooling.forward( features={ "token_embeddings": torch.Tensor(hidden_state[0]), "attention_mask": torch.Tensor(inputs.get("attention_mask")), }, ) if self.normalization is not None: sentence_embedding = self.normalization.forward(features=sentence_embedding) sentence_embedding = sentence_embedding["sentence_embedding"] if sentence_embedding.shape[0] == 1: sentence_embedding = sentence_embedding[0] return sentence_embedding.numpy() def sentence_transformers_onnx( model, path, do_lower_case=True, input_names=["input_ids", "attention_mask", "segment_ids"], providers=["CPUExecutionProvider"], ): """OnxRuntime for sentence transformers. Parameters ---------- model SentenceTransformer model. path Model file dedicated to session inference. do_lower_case Either or not the model is cased. input_names Fields needed by the Transformer. providers Either run the model on CPU or GPU: ["CPUExecutionProvider", "CUDAExecutionProvider"]. """ try: import onnxruntime except: raise ValueError("You need to install onnxruntime.") model.save(path) configuration = transformers.AutoConfig.from_pretrained( path, from_tf=False, local_files_only=True ) tokenizer = transformers.AutoTokenizer.from_pretrained( path, do_lower_case=do_lower_case, from_tf=False, local_files_only=True ) encoder = transformers.AutoModel.from_pretrained( path, from_tf=False, config=configuration, local_files_only=True ) st = ["cherche"] inputs = tokenizer( st, padding=True, truncation=True, max_length=tokenizer.__dict__["model_max_length"], return_tensors="pt", ) model.eval() with torch.no_grad(): symbolic_names = {0: "batch_size", 1: "max_seq_len"} torch.onnx.export( encoder, args=tuple(inputs.values()), f=f"{path}.onx", opset_version=13, # ONX version needs to be >= 13 for sentence transformers. do_constant_folding=True, input_names=input_names, output_names=["start", "end"], dynamic_axes={ "input_ids": symbolic_names, "attention_mask": symbolic_names, "segment_ids": symbolic_names, "start": symbolic_names, "end": symbolic_names, }, ) normalization = None for modules in model.modules(): for idx, module in enumerate(modules): if idx == 1: pooling = module if idx == 2: normalization = module break return OnnxEncoder( session=onnxruntime.InferenceSession(f"{path}.onx", providers=providers), tokenizer=tokenizer, pooling=pooling, normalization=normalization, )
The
sentence_transformers_onx
function returns a model with a method encode that behave like SentenceTransformers models.model = sentence_transformers_onnx( model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2"), path = "onnx_model", )
Raphaël
Thanks for nice work. On the other hand, it took longer using
onnx
than normalSentenceTransformer
under GPU. Any thoughts?
If I do
self.model = sentence_transformers_onnx(
model = SentenceTransformer(model_name),
path = "onnx_model",
)
with torch.no_grad():
model_output = self.model(**encoded_input)
sentence_embeddings = model_output[0][:, 0]
I'm then getting the error
model_output = self.model(**encoded_input)
TypeError: 'OnnxEncoder' object is not callable
Hello!
I've added native ONNX support in Sentence Transformers, so users can now look at the Speeding up Inference documentation, under the ONNX section:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
I would like to convert sentence bert model from pytorch to tensorflow use onnx, and tried to follow the standard onnx procedure for converting a pytorch model. But I'm having difficulty determining the onnx input arguments for sentence bert model, I encounter
TypeError: forward() takes 2 positional arguments but 4 were given.
Suggestions appreciated!model = SentenceTransformer('output/continue_training_model')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
dummy_input0 = torch.LongTensor(batch_size, max_seq_length).to(device)
dummy_input1 = torch.LongTensor(batch_size, max_seq_length).to(device)
dummy_input2 = torch.LongTensor(batch_size, max_seq_length).to(device)
torch.onnx.export(model,(dummy_input0, dummy_input1,dummy_input2), onnx_file_name, verbose=True)