Open YouSenRong opened 1 year ago
How do you set the layer precision? did you set the layer contrain to obey? See https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#abdc74c40fe7a0c3d05d2caeccfbc29c1
Tanks for you reply! @zerollzeng
How do you set the layer precision?
I set the precision by calling the setPrecision of a layer, as
did you set the layer contrain to obey?
Yes, I had set the BuilderFlag::kOBEY_PRECISION_CONSTRAINTS, as: However, it still doesn't work.
For the other layers, the setPrecision work. Only the setPrecision of "phase0_tf/predict_node/y:0" layer doesn't take effect.
Could you please provide a reproduce for us? thanks!
I would prefer an onnx model that can reproduce this error.
Sorry for the late response. @zerollzeng
I have split a subgraph of the model subgraph.onnx.zip, But I can‘t reproduce the error on the subgraph.
However, I can reproduce the error on the full model.
I run the subgraph and the full model with trtexec based on TensorRT 8.6 with command:
./trtexec --onnx=subgraph.onnx --fp16 --verbose --builderOptimizationLevel=3 --layerPrecisions="phase0_tf/predict_node/y:0:fp32,phase0_tf/predict_node:fp32" --layerOutputTypes="phase0_tf/predict_node/y:0:fp32" --precisionConstraints="obey" > subgraph.log 2>&1
./trtexec --onnx=full_model.onnx --fp16 --verbose --builderOptimizationLevel=3 --layerPrecisions="phase0_tf/predict_node/y:0:fp32,phase0_tf/predict_node:fp32" --layerOutputTypes="phase0_tf/predict_node/y:0:fp32" --precisionConstraints="obey" > full_model.log 2>&1
And the log are show as follows: It seems that the utilized tactics are different between the subgraph and full model.
Besides, I had set the "phase0_tf/predict_node/y:0" and "phase0_tf/predict_node" to fp32, but the warning message still shows that the layer "phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]" is fp16 subnormal.
For the full model, maybe I have to ask for the agreement to share. Or can I send the full model to you privately instead of publicly on github?
@nvpohanh On the right part of the image, it's a myelin subgraph, is it possible that myelin already set the precision to FP32 but just didn't print it in the log?
is it possible that myelin already set the precision to FP32 but just didn't print it in the log?
It can't account for the large difference between pure FP32 and mixed FP32&FP16.
Several things I would try:
set_output_type()
) to FP32.On the right part of the image, it's a myelin subgraph, is it possible that myelin already set the precision to FP32 but just didn't print it in the log?
If the ForeignNode optimization is triggered, we do not have information about the detailed dtype info. We will need to use Nsys to look at it (or use --dumpLayerInfo --profilingVerbosity=detailed
with latest TRT internal build).
I think the first thing we should do is to repro the accuracy difference between pure-FP32 and FP32+FP16.
@nvpohanh Do you need the full model to reproduce the error?
Probably don't need the full model, but need a way to repro the "large difference between pure FP32 and mixed FP32&FP16" you mentioned.
Based TensorRT 8.6, the diff between shown as follows. absolute difference: min: 9.02219e-10 (0.000139833, 0.000139832), max: 0.00138001 (0.0436334, 0.0450134), mean: 9.89399e-06 relative difference: min: 5.52027e-06 (0.0022354, 0.00223541), max: 0.141119 (8.68643e-05, 9.91225e-05), mean: 0.00445263 The max relative difference is about 0.14. Base TensorRT 8.4.3, the max relative difference between FP32 and mixed FP32+FP16 is only about 0.01.
For the repro, I try to save the input data.
A similar issue: https://github.com/NVIDIA/TensorRT/issues/3257
@zerollzeng is this dup of #3257 ? thanks
@zerollzeng is this dup of #3257 ? thanks
Maybe not.
absolute difference: min: 9.02219e-10 (0.000139833, 0.000139832), max: 0.00138001 (0.0436334, 0.0450134), mean: 9.89399e-06 relative difference: min: 5.52027e-06 (0.0022354, 0.00223541), max: 0.141119 (8.68643e-05, 9.91225e-05), mean: 0.00445263 The max relative difference is about 0.14. Base TensorRT 8.4.3, the max relative difference between FP32 and mixed FP32+FP16 is only about 0.01.
The diff doesn't look very big in both case. what is the out data range?
what is the out data range?
What does the out data range mean? I had tried both "enqueueV2" and "enqueueV3" api, but all the results have big diffs. I am organizing the details.
output data range. e.g if the range is [-1, 1], then the diff(max 0.001) look good to me.
1: The data range is [0, 1]. But the relative difference is too big, and I thinks it is caused by that the setting layer precision to FP32 doesn't take effect. Besides, not always max 0.001, some time bigger. For some case where setting layer precision can take effect, the diff is small. But in some case where setting layer precision doesn't take effect, the diff is big.
But the relative difference is too big
if trt output has value 0.000001 and the onnx output has value 0.000002. then you will see the a relative difference of 1. Have you try Po-Han's suggestion to set the layer precision?
if trt output has value 0.000001 and the onnx output has value 0.000002. then you will see the a relative difference of 1.
Yes, I understand, but the absolute diff is not so small.
Have you try Po-Han's suggestion to set the layer precision?
Yes, I set the layer precision, but it still doesn't take effect as Po-Han's suggestion.
Okay, I think we need a reproduce to debug this issue further.
Take the result of TensorFlow(int FP32) as criterion, compared to TRT8.4, TRT8.6, TRT9.1 (in FP16), 10 samples are as follows:
TF(FP32) vs TRT8.4(FP16) diff_tf_trt8.4.txt
TF(FP32) vs TRT8.6(FP16) diff_tf_trt8.6.txt
TF(FP32) vs TRT9.1(FP16) diff_tf_trt9.1.txt
From these data, it shows that the diff of trt8.6 and trt9.1 is bigger.
Besides, with set_precision, the diff of TF(fp32) and TRT8.4(FP16) can be reduce: diff_tf_trt8.4-set_precision.txt But the set_precision doesn't make sense in TRT8.6 and TRT9.1.
Description
As I had reflected in this Skipping tactic 0x0000000000000000 due to Myelin error" degrade performance.,set layer precision may failed in TensorRT 8.4.3 due to the ConstShuffleFusion.
In these days, I try TensorRT 8.6.1, but it seems that setting layer precision may still fail due to the ConstShuffleFusion. For example, as show in the graph, the Max op take a const input named "phase0_tf/predict_node/y:0", and the value seem to be fp16 subnormal, so I use set_precision api to set the layer ("phase0_tf/predict_node/y:0") to fp32 explicitly.
The verbose logs are as follows:
When the fp16 subnormal is not set to fp32, the log are as follows, the layer "phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]" is fp16 precision:
However, when the fp16 subnormal is set to fp32, the layer "phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]" is still fp16.
By the way, the ConstShuffleFusion produce two kind of layer, such as
I am confused the differences. Is that the reason set_precision fails of the layer "phase0_tf/predict_node/y:0"?
Looking forward to your reply. Thanks a lot!
Environment
TensorRT Version: 8.6.1
NVIDIA GPU: T4
NVIDIA Driver Version: 510
CUDA Version:12.0
CUDNN Version:
Operating System: Ubuntu20.04
Python Version (if applicable):
Tensorflow Version (if applicable): 1.4
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):