NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.56k stars 2.1k forks source link

onnx -> fp16 tensorrt mismatched outputs, How to set some layer to fp32 in setFlag(nvinfer1::BuilderFlag::kFP16)? #2360

Closed SolDogLi closed 1 year ago

SolDogLi commented 1 year ago

Description

outputs of onnx to tensorrt are different from outputs of onnx,I want to know how to set the layer to fp32 in setFlag(nvinfer1::BuilderFlag::kFP16)?

Environment

TensorRT Version: 8.2.3 NVIDIA GPU: RTS 2060 NVIDIA Driver Version: 470.141.03 CUDA Version: 11.4 CUDNN Version: 8.1 Operating System: ubuntu18.04 Python Version (if applicable): 3.7 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.10.0 Baremetal or Container (if so, version):

Relevant Files

https://drive.google.com/file/d/1-koOd12w1BNrXyWUP_9E17pWL7aUqLVy/view?usp=sharing

Steps To Reproduce

fp32 onnx to fp16 tensorrt and compare polygraphy run /home/gxl/Videos/BEVerse/BEVerse/onnx_model/mtl_singleframe_head_outGS_bevpool.par.onnx --onnxrt --trt --atol 1e-3 --rtol 1e-3 --fp16 --verbose

[I] Accuracy Comparison | onnxrt-runner-N0-09/29/22-10:31:23 vs. trt-runner-N0-09/29/22-10:31:23
[I]     Comparing Output: '1067' (dtype=float32, shape=(1, 64, 128, 128)) with '1067' (dtype=float32, shape=(1, 64, 128, 128))
[I]     Tolerance: [abs=0.001, rel=0.001] | Checking elemwise error
[I]         onnxrt-runner-N0-09/29/22-10:31:23: 1067 | Stats: mean=0.081461, std-dev=0.4418, var=0.19519, median=0.17169, min=-2.1903 at (0, 38, 6, 1), max=1.9211 at (0, 39, 0, 127), avg-magnitude=0.39728
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-2.19 , -1.78 ) |         37 | 
                (-1.78 , -1.37 ) |        258 | 
                (-1.37 , -0.957) |       1988 | 
                (-0.957, -0.546) |      57217 | ######
                (-0.546, -0.135) |     305177 | ###################################
                (-0.135, 0.277 ) |     261154 | ##############################
                (0.277 , 0.688 ) |     339844 | ########################################
                (0.688 , 1.1   ) |      82291 | #########
                (1.1   , 1.51  ) |        605 | 
                (1.51  , 1.92  ) |          5 | 
[I]         trt-runner-N0-09/29/22-10:31:23: 1067 | Stats: mean=0.081442, std-dev=0.44147, var=0.19489, median=0.17175, min=-2.1777 at (0, 38, 6, 1), max=1.917 at (0, 39, 0, 127), avg-magnitude=0.39702
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-2.19 , -1.78 ) |         36 | 
                (-1.78 , -1.37 ) |        258 | 
                (-1.37 , -0.957) |       1981 | 
                (-0.957, -0.546) |      57093 | ######
                (-0.546, -0.135) |     305272 | ###################################
                (-0.135, 0.277 ) |     261297 | ##############################
                (0.277 , 0.688 ) |     340025 | ########################################
                (0.688 , 1.1   ) |      82005 | #########
                (1.1   , 1.51  ) |        604 | 
                (1.51  , 1.92  ) |          5 | 
[I]         Error Metrics: 1067
[I]             Minimum Required Tolerance: elemwise error | [abs=0.072875] OR [rel=inf] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.0013225, std-dev=0.0017359, var=3.0134e-06, median=0.00085402, min=0 at (0, 3, 11, 122), max=0.072875 at (0, 38, 119, 20), avg-magnitude=0.0013225
[I]                 ---- Histogram ----
                    Bin Range          |  Num Elems | Visualization
                    (0      , 0.00729) |    1036301 | ########################################
                    (0.00729, 0.0146 ) |      10210 | 
                    (0.0146 , 0.0219 ) |       1443 | 
                    (0.0219 , 0.0292 ) |        375 | 
                    (0.0292 , 0.0364 ) |        129 | 
                    (0.0364 , 0.0437 ) |         44 | 
                    (0.0437 , 0.051  ) |         35 | 
                    (0.051  , 0.0583 ) |         28 | 
                    (0.0583 , 0.0656 ) |          7 | 
                    (0.0656 , 0.0729 ) |          4 | 
[I]             Relative Difference | Stats: mean=inf, std-dev=nan, var=nan, median=0.0023031, min=0 at (0, 3, 11, 122), max=inf at (0, 14, 9, 101), avg-magnitude=inf
[V]                 Could not generate histogram. Note: Error was: autodetected range of [0.0, inf] is not finite
[I]                 
[E]         FAILED | Difference exceeds tolerance (rel=0.001, abs=0.001)
[E]     FAILED | Mismatched outputs: ['1067']
[!] FAILED | Command: ./polygraphy run /home/gxl/Videos/BEVerse/BEVerse/onnx_model/mtl_singleframe_head_outGS_bevpool.par.onnx --onnxrt --trt --atol 1e-3 --rtol 1e-3 --fp16 --verbose

and then I want to set some layer to fp32,I added the following code to onnx-tensorrt to set all layers as FP32 , but it doesn't seem to work:

for(int i = 0; i < trt_network->getNbLayers(); i ++)
{
  auto layer = trt_network->getLayer(i);
  std::string layerName = layer->getName();
  cout << "process " << layerName << endl;
  auto layer_type = layer->getType();

   auto layer_precision = layer->getPrecision();

 if(layer_type == nvinfer1::LayerType::kSHAPE || layer_type == nvinfer1::LayerType::kIDENTITY ||
   layer_type == nvinfer1::LayerType::kSHUFFLE || layer_type == nvinfer1::LayerType::kSLICE || layer_type == 
   nvinfer1::LayerType::kCONCATENATION){
   continue;
 }
 if(layer_precision == nvinfer1::DataType::kINT32){
   continue;
 }
   if(layerName == "Tile"){
   continue;
 }

   layer->setPrecision(nvinfer1::DataType::kFLOAT);
   cout << "Set " << layerName << " to FP32 mode " << endl;
}

but the output is basically unchanged. Is there a way to individually set some layers to fp32?

Thanks!

zerollzeng commented 1 year ago

you need https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.BuilderFlag OBEY_PRECISION_CONSTRAINTS

Or use trtexec with --precisionConstraints=obey --layerPrecisions=spec

azhurkevich commented 1 year ago

@SolDogLi you can also refer to this example of how we are setting specific layers to fp16 wne doing mixed precision between int8 and fp16.