Closed spitzblattr closed 9 months ago
I've found the reason........ in pytorch and onnx model the input should be as the previous code:
bindings = [int(d_input_ids), int(d_token_type_ids), int(d_attention_mask), int(d_output)]
but in tensorrt it should be
bindings = [int(d_input_ids), int(d_attention_mask), int(d_token_type_ids), int(d_output)]
I retried in windows docker and linux and modified onnx layers but never thought it was because of this...... sorry for bothering..................
Description
Hi, I tried to export a onnx format ber-base-chinese model to tensorrt engine with tensorrt version8.6.1. During the whole process the output log shows no error message, but when inferring, the tensorrt model always gives wrong predictions.
Environment
TensorRT Version:8.6.1
NVIDIA GPU:RTX 3060 Laptop
CUDA Version: 11.8
CUDNN Version:8700
Operating System: windows 11
Python Version (if applicable): python 3.10.13
Pytorch Version: 2.0.1
Baremetal or Container (if so, version): none
Steps To Reproduce
First I use optimum-cli command to export the huggingface bert-base-chinese model to onnx format with dynamic input batches:
Here's the onnx model that provides correct answers (exactly the same predictions and probs as the original pytorch bert-base-chinese model gives) when inferring, if anyone needs it: https://drive.google.com/drive/folders/1whFFgmQ5IP_crFlxxbGsbq8QhTpW3zyc?usp=sharing
Then I use the following code to export the onnx model to tensorrt engine format
The whole process takes about 1 minute with no error logs. Here's the exported tensorrt engine file: https://drive.google.com/file/d/16CNgLcNlwJfuEbqvgA4U4yL_voAFgxIl/view?usp=sharing Finally use the following code to do infer with the tensorrt engine:
No matter what the input sensors are, the model always gives wrongs prediction tokens. I also tried to export onnx model and tensorrt engine into fp32 precision, and the outputs are samely wrong.
Have you tried the latest release?: Yes. I also tried on a previous version(8.5.1.7) and it gives the same outputs.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
): I inferred with the onnx model(the model in the code above:"models/onnx_export_bert_chinese/model.onnx") several times, and it always gives correct answer (exactly the same predictions and probs as the original pytorch bert-base-chinese model gives). So it doesn't seem like there's any error with the onnx model. I also tried using the trtexec command(with the same params as the code above) to convert onnx model to tensorrt engine. It gives the same wrong predictions.Any help is appreciated. >-<