Open TanvirHundredOne opened 1 year ago
You can use--monolith
flag in optimum export onnx
command to force it to export a single ONNX file, but this is not recommended for encoder-decoder models like Donut
--monolith Forces to export the model as a single ONNX file. By default, the ONNX exporter may break the
model in several ONNX files, for example for encoder-decoder models where the encoder should be
run only once while the decoder is looped over.
How do I run inference using the donut_model.onnx
? Huggingface's solution wants the decoupled .onnx
files.
I'm working on a production solution where I want to relay my Donut model to an Nivdia Triton Inference server. But I'm struggling to convert my Donut model and associated files( tokenizer, processor etc. ) to a single onnx file, which is preferred for Triton.
I've had some limited success where I ended up with an onnx file and some other meta data. Can anyone please help me in packaging it into a single file?
I have tried to create onnx file using optimum library, but it creates this file structure
|-- added_tokens.json |-- config.json |-- decoder_model.onnx |-- decoder_model.onnx_data |-- decoder_with_past_model.onnx |-- decoder_with_past_model.onnx_data |-- encoder_model.onnx |-- generation_config.json |-- preprocessor_config.json |-- sentencepiece.bpe.model |-- special_tokens_map.json |-- tokenizer_config.json `-- tokenizer.json
Where Ideally there should be a single
donut_model.onnx
file.Thanks in Advance