Open Matthieu-Tinycoaching opened 2 years ago
Hi,
We convert all int64 inputs to int32 as it's not supported by TensorRT and we are not aware of a tokenizer (in NLP at least) using a range of IDs outside int32.
The conversion is done on this there: https://github.com/ELS-RD/transformer-deploy/blob/v0.4.0/src/transformer_deploy/backends/pytorch_utils.py#L123
You would just have to comment that part. Just note that version 0.4 is a bit old, we are waiting for a new Onnx Runtime release to make a new docker image (current ORT is buggy and doesn't match our expectations).
May I ask you why you don't convert the input_ids tensor to int32 dtype?
Hi @pommedeterresautee thanks for the detailed answer. I don't want to use inputs_ids.as_type(np.int32)
as it seems to create a copy of the initial tensor and would like to minimize latency for real time inference.
Would you have any suggestions?
Moreover, do you know when the new docker image will be out?
Hi @pommedeterresautee could you give feedbacks to my previous message?
Hi,
I have used the docker container to convert a model to ONNX:
But when trying to run this ONNX model:
I got the following error:
It seems that the ONNX model has been exported with int32 as input instead of the classical int64. Is there a way to mitigate this?
Thanks!