Closed sonic182 closed 1 year ago
Seems that the format generated by optimum-cli is different than using torch.onnx, now it works ok using this code (based on your stablelm example)
# !mkdir output
import torch.onnx
from transformers import AutoTokenizer, AutoModel
model_id = "intfloat/e5-small-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)
prompt = "this is arbitrary text"
inputs = tokenizer(prompt, return_tensors="pt")
torch.onnx.export(
model,
(inputs["input_ids"].cpu(), inputs["attention_mask"].cpu()),
"output/e5-small-v2.onnx",
input_names=["input_ids", "attention_mask"],
dynamic_axes={
"input_ids": {0: "batch_size", 1: "sequence_length"},
"attention_mask": {0: "batch_size", 1: "sequence_length"},
},
)
I left the issue open in case you want to support the optimum-cli generated models
A quick question, in my case, I'm trying to get embeddings using model intfloat/e5-small-v2
, this is the correct way of doing it?
This is the code that I think it gives me the embeddings, the result embeddings are a bit different from using the model with bumblebee + axon/XLA directly, but I think it has to be with the format of onnx
Based on that error, my best guess is that optimum is using a custom opset that isn't part of the default ONNX spec (specifically ReduceSum with an axes attribute), however I don't know much about how optimum works at a deeper level. You may be able to reduce the opset in the CLI output to force optimum to not do this, however I'm not sure.
As far as slightly different values of Ortex vs XLA output, that's expected. Since they're using different Nx backends they'll be slightly different for floats.
I'm going to close this for now since supporting optimum specific exports isn't planned, but will investigate why this failed and may open a related issue if it turns out to be due to Ortex/onnxruntime.
Forget this, It was because I wasn't having all onnx model files from the folder created when generating the onnx model.
I mean, from colab when generating the model, I only downloaded the .onnx file, and I needed to download all other files too (the whole folder)
Ah, yes that will do it do! So the optimum export works as expected if all onnx files are transferred?
I really didn't finish that test, because I started to use torch.onnx export, whatever, it splits in multiple files whenever the model size is more than 2GB I think, so is the same stuff that it splits in various files
Hi, I'm trying to load a onnx model and I'm getting this error:
(RuntimeError) Failed to create ONNX Runtime session: Load model from /home/sonic182/work/myproyect/models/e5-small-v2-onnx/model.onnx failed:This is an invalid model. In Node, ("MaskReduceSum_0", ReduceSum, "", -1) : ("attention_mask_int32": tensor(int32),) -> ("mask_index_0",) , Error Unrecognized attribute: axes for operator ReduceSum (ortex 0.1.6) lib/ortex/model.ex:28: Ortex.Model.load/3 /home/sonic182/sandbox/livebooks/embeddings.livemd#cell:s35av5wcbejnldtu5ekss352elzjd74y:3: (file)
The model was generated with this two code lines in python 3.10.12 (using colab, moving results to my drive and downloading it)
using huggingface optimum cli, the model is e5-small-v2
This kind of model operations are not supported?