Error loading onnx model

sonic182 commented 1 year ago

Hi, I'm trying to load a onnx model and I'm getting this error:

(RuntimeError) Failed to create ONNX Runtime session: Load model from /home/sonic182/work/myproyect/models/e5-small-v2-onnx/model.onnx failed:This is an invalid model. In Node, ("MaskReduceSum_0", ReduceSum, "", -1) : ("attention_mask_int32": tensor(int32),) -> ("mask_index_0",) , Error Unrecognized attribute: axes for operator ReduceSum (ortex 0.1.6) lib/ortex/model.ex:28: Ortex.Model.load/3 /home/sonic182/sandbox/livebooks/embeddings.livemd#cell:s35av5wcbejnldtu5ekss352elzjd74y:3: (file)

The model was generated with this two code lines in python 3.10.12 (using colab, moving results to my drive and downloading it)

!pip install transformers "onnxruntime<1.15" optimum[onnxruntime,exporters] torch
!optimum-cli export onnx -m intfloat/e5-small-v2 --optimize O2 e5-small-v2-onnx --opset 18

using huggingface optimum cli, the model is e5-small-v2

This kind of model operations are not supported?

sonic182 commented 1 year ago

Seems that the format generated by optimum-cli is different than using torch.onnx, now it works ok using this code (based on your stablelm example)

# !mkdir output

import torch.onnx
from transformers import AutoTokenizer, AutoModel

model_id = "intfloat/e5-small-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(model_id)

prompt = "this is arbitrary text"
inputs = tokenizer(prompt, return_tensors="pt")

torch.onnx.export(
    model,
    (inputs["input_ids"].cpu(), inputs["attention_mask"].cpu()),
    "output/e5-small-v2.onnx",
    input_names=["input_ids", "attention_mask"],
    dynamic_axes={
        "input_ids": {0: "batch_size", 1: "sequence_length"},
        "attention_mask": {0: "batch_size", 1: "sequence_length"},
    },
)

I left the issue open in case you want to support the optimum-cli generated models

sonic182 commented 1 year ago

A quick question, in my case, I'm trying to get embeddings using model intfloat/e5-small-v2, this is the correct way of doing it?

This is the code that I think it gives me the embeddings, the result embeddings are a bit different from using the model with bumblebee + axon/XLA directly, but I think it has to be with the format of onnx

mortont commented 1 year ago

Based on that error, my best guess is that optimum is using a custom opset that isn't part of the default ONNX spec (specifically ReduceSum with an axes attribute), however I don't know much about how optimum works at a deeper level. You may be able to reduce the opset in the CLI output to force optimum to not do this, however I'm not sure.

As far as slightly different values of Ortex vs XLA output, that's expected. Since they're using different Nx backends they'll be slightly different for floats.

I'm going to close this for now since supporting optimum specific exports isn't planned, but will investigate why this failed and may open a related issue if it turns out to be due to Ortex/onnxruntime.

sonic182 commented 1 year ago

Forget this, It was because I wasn't having all onnx model files from the folder created when generating the onnx model.

I mean, from colab when generating the model, I only downloaded the .onnx file, and I needed to download all other files too (the whole folder)

mortont commented 1 year ago

Ah, yes that will do it do! So the optimum export works as expected if all onnx files are transferred?

sonic182 commented 1 year ago

I really didn't finish that test, because I started to use torch.onnx export, whatever, it splits in multiple files whenever the model size is more than 2GB I think, so is the same stuff that it splits in various files

elixir-nx / ortex

Error loading onnx model #16