Closed Nioolek closed 3 years ago
@Nioolek i met the same problem, have you solved it?
same problem here. I also tried to convert 3D CONV model from pytorch to caffe to tensorrt,trt inference result is wrong.
Hello @Nioolek , thanks for reporting. We release a new tool in 7.2 to compare trt run with other framework. Please check https://github.com/NVIDIA/TensorRT/tree/release/7.2/tools/Polygraphy
And for you case, you can run with command line:
polygraphy run 3dcnn.onnx --trt --onnxrt
I get result
[I] Runner: onnxrt-runner-N0-10/23/20-00:55:07 | Completed 1 iterations. [I] Accuracy Comparison | trt-runner-N0-10/23/20-00:55:07 vs. onnxrt-runner-N0-10/23/20-00:55:07 [I] Comparing Output: '177' (dtype=float32, shape=(1, 2)) with '177' (dtype=float32, shape=(1, 2)) [S] PASSED | Difference is within tolerance (rtol=1e-05, atol=1e-05) [S] PASSED | Command: /home/vincenth/.local/bin/polygraphy run 3dcnn.onnx --trt --onnxrt
The small mismatch could introduced by floating point arithmetic sequence. The DNNs are by nature robust against perturbation most of the time, this is why fp16/INT8 works. Have you compared the end2end accuracy instead of bit level mismatch?
Hello @Nioolek , thanks for reporting. We release a new tool in 7.2 to compare trt run with other framework. Please check https://github.com/NVIDIA/TensorRT/tree/release/7.2/tools/Polygraphy
And for you case, you can run with command line:
polygraphy run 3dcnn.onnx --trt --onnxrt
I get result
[I] Runner: onnxrt-runner-N0-10/23/20-00:55:07 | Completed 1 iterations. [I] Accuracy Comparison | trt-runner-N0-10/23/20-00:55:07 vs. onnxrt-runner-N0-10/23/20-00:55:07 [I] Comparing Output: '177' (dtype=float32, shape=(1, 2)) with '177' (dtype=float32, shape=(1, 2)) [S] PASSED | Difference is within tolerance (rtol=1e-05, atol=1e-05) [S] PASSED | Command: /home/vincenth/.local/bin/polygraphy run 3dcnn.onnx --trt --onnxrt
The small mismatch could introduced by floating point arithmetic sequence. The DNNs are by nature robust against perturbation most of the time, this is why fp16/INT8 works. Have you compared the end2end accuracy instead of bit level mismatch?
If I have time, I will take a test.The problem remains unsolved.We used libtorch to inference 3DCNN before, but this doesn't seem to be the best choice for an online inference environment.
thanks @Nioolek , do you see accuracy loss in trt with real data?
Sorry @Nioolek @XinnWang @111qqz , I was using an internal nightly in previous comment, now the issue can be reproduced after use the 7.2 release. Now we have internal track for this issue, and before we fixed that, I have a script to modify your onnx to workaround this issue, and I have verified using your model. Could you take a try? thanks!
import onnx
model = onnx.load('3dcnn.onnx')
graph = model.graph
nodes = graph.node
initlist = [init.name for init in graph.initializer]
for node1 in nodes:
if node1.op_type == "Add":
consInput = None
if node1.input[0] in initlist:
consInput = node1.input[0]
if node1.input[1] in initlist:
consInput = node1.input[1]
if consInput:
idOutput0 = "ident_{}".format(consInput)
nodeIdent = onnx.helper.make_node(
'Identity',
[consInput], # inputs
[idOutput0], # outputs
)
node1.input.remove(consInput)
node1.input.extend([idOutput0])
nodes.extend([nodeIdent])
model_def = onnx.helper.make_model(graph)
onnx.save(model_def, './update_model.onnx')
I will close this, please reopen if you still have question, thanks!
@XinnWang @111qqz I have not solved this problem.I used libtorch to inference the network instead.
Description
I tried to convert 3D CONV model from pytorch to onnx to tensorrt. Everything seems to work well. I tried to inference the model in pytorch 、onnx and tensorrt. The inference results of pytorch and onnx are same,but the inference result of onnx and tensorrt are different. So I located the problem at trt engine.
What I have checked: Input shape, onnx model(checked in Netron)
Environment
TensorRT Version: 7.0.0.11 GPU Type: 2080Ti * 2 Nvidia Driver Version: 440.33.01 CUDA Version: 10.0 CUDNN Version: 7.6 Operating System + Version: ubuntu 16.04 Python Version (if applicable): 3.6.9 TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.2.0 Baremetal or Container (if container which image + tag): build tensorrt container by myself according to official instructions
Relevant Files
https://drive.google.com/open?id=1oZ550uIm-IzM0E4CpUc-rtlGdjzVlMSj
Steps To Reproduce
I use the code download from "https://github.com/rmccorm4/tensorrt-utils/blob/master/classification/imagenet/onnx_to_tensorrt.py" to convert onnx model to tensorrt. Code instructions:
python onnx_to_tensorrt.py --onnx 3dcnn.onnx -o 3dcnn_docker1.trt -b 1 -v --explicit-batch --gpu-fallback --calibration-batch-size 1
Log:
Inference code is referenced from https://github.com/rmccorm4/tensorrt-utils/blob/master/classification/imagenet/infer_tensorrt_imagenet.py
Is this caused by INT64 param?