ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.64k stars 150 forks source link

[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int64)) , expected: (tensor(int32)) #110

Open Matthieu-Tinycoaching opened 2 years ago

Matthieu-Tinycoaching commented 2 years ago

Hi,

I have used the docker container to convert a model to ONNX:

docker run -it --rm --gpus all \
    -v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.4.0 \
    bash -c "cd /project && \
    convert_model -m \"cross-encoder/mmarco-mMiniLMv2-L12-H384-v1\" \
    --backend tensorrt onnx \
    --seq-len 16 128 128"

But when trying to run this ONNX model:

model_input = tokenizer(query_list, paragraph,  padding=True, truncation=True, max_length=max_length_tokens, return_tensors="pt")
model_input = {name : np.atleast_2d(value) for name, value in model_input.items()}
onnx_result = sess.run(None, model_input)[0]

I got the following error:

InvalidArgument                           Traceback (most recent call last)
/home/matthieu/Code/Python/ONNX-Export-mmarco-mMiniLMv2-L12-H384-v1.ipynb Cell 13' in <cell line: 8>()
      [6](vscode-notebook-cell:/home/matthieu/Code/Python/ONNX-Export-mmarco-mMiniLMv2-L12-H384-v1.ipynb#ch0000014?line=5) model_input = tokenizer(query_list, paragraph,  padding=True, truncation=True, max_length=max_length_tokens, return_tensors="pt")
      [7](vscode-notebook-cell:/home/matthieu/Code/Python/ONNX-Export-mmarco-mMiniLMv2-L12-H384-v1.ipynb#ch0000014?line=6) model_input = {name : np.atleast_2d(value) for name, value in model_input.items()}
----> [8](vscode-notebook-cell:/home/matthieu/Code/Python/ONNX-Export-mmarco-mMiniLMv2-L12-H384-v1.ipynb#ch0000014?line=7) onnx_result = sess.run(None, model_input)[0]
     [10](vscode-notebook-cell:/home/matthieu/Code/Python/ONNX-Export-mmarco-mMiniLMv2-L12-H384-v1.ipynb#ch0000014?line=9) onnx_result

File ~/anaconda3/envs/haystack-gpu-fresh/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:192, in Session.run(self, output_names, input_feed, run_options)
    190     output_names = [output.name for output in self._outputs_meta]
    191 try:
--> 192     return self._sess.run(output_names, input_feed, run_options)
    193 except C.EPFail as err:
    194     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(int64)) , expected: (tensor(int32))

It seems that the ONNX model has been exported with int32 as input instead of the classical int64. Is there a way to mitigate this?

Thanks!

pommedeterresautee commented 2 years ago

Hi,

We convert all int64 inputs to int32 as it's not supported by TensorRT and we are not aware of a tokenizer (in NLP at least) using a range of IDs outside int32.

The conversion is done on this there: https://github.com/ELS-RD/transformer-deploy/blob/v0.4.0/src/transformer_deploy/backends/pytorch_utils.py#L123

You would just have to comment that part. Just note that version 0.4 is a bit old, we are waiting for a new Onnx Runtime release to make a new docker image (current ORT is buggy and doesn't match our expectations).

May I ask you why you don't convert the input_ids tensor to int32 dtype?

Matthieu-Tinycoaching commented 2 years ago

Hi @pommedeterresautee thanks for the detailed answer. I don't want to use inputs_ids.as_type(np.int32) as it seems to create a copy of the initial tensor and would like to minimize latency for real time inference.

Would you have any suggestions?

Moreover, do you know when the new docker image will be out?

Matthieu-Tinycoaching commented 2 years ago

Hi @pommedeterresautee could you give feedbacks to my previous message?