Closed yunyaoXYY closed 1 year ago
Just did a quick test with polygraphy:
[I] onnxrt-runner-N0-08/11/22-13:12:28
---- Inference Input(s) ----
{x2paddle_images [dtype=float32, shape=(1, 3, 640, 640)]}
[I] onnxrt-runner-N0-08/11/22-13:12:28
---- Inference Output(s) ----
{save_infer_model/scale_0.tmp_0 [dtype=float32, shape=(1, 25200, 85)]}
[I] onnxrt-runner-N0-08/11/22-13:12:28 | Completed 1 iteration(s) in 127.3 ms | Average inference time: 127.3 ms.
[I] Accuracy Comparison | trt-runner-N0-08/11/22-13:12:28 vs. onnxrt-runner-N0-08/11/22-13:12:28
[I] Comparing Output: 'save_infer_model/scale_0.tmp_0' (dtype=float32, shape=(1, 25200, 85)) with 'save_infer_model/scale_0.tmp_0' (dtype=float32, shape=(1, 25200, 85))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] trt-runner-N0-08/11/22-13:12:28: save_infer_model/scale_0.tmp_0 | Stats: mean=8.6759, std-dev=56.93, var=3241, median=0.00322, min=1.3113e-06 at (0, 972, 4), max=638.06 at (0, 2239, 0), avg-magnitude=8.6759
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(1.16e-06, 65.5) | 2088552 | ########################################
(65.5 , 131 ) | 11194 |
(131 , 196 ) | 6518 |
(196 , 262 ) | 5630 |
(262 , 327 ) | 5201 |
(327 , 393 ) | 5169 |
(393 , 458 ) | 5191 |
(458 , 524 ) | 5386 |
(524 , 589 ) | 5341 |
(589 , 655 ) | 3818 |
[I] onnxrt-runner-N0-08/11/22-13:12:28: save_infer_model/scale_0.tmp_0 | Stats: mean=8.6698, std-dev=56.936, var=3241.7, median=0.0031733, min=1.1623e-06 at (0, 972, 4), max=654.85 at (0, 24869, 2), avg-magnitude=8.6698
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(1.16e-06, 65.5) | 2088626 | ########################################
(65.5 , 131 ) | 11133 |
(131 , 196 ) | 6502 |
(196 , 262 ) | 5614 |
(262 , 327 ) | 5207 |
(327 , 393 ) | 5178 |
(393 , 458 ) | 5190 |
(458 , 524 ) | 5388 |
(524 , 589 ) | 5342 |
(589 , 655 ) | 3820 |
[I] Error Metrics: save_infer_model/scale_0.tmp_0
[I] Minimum Required Tolerance: elemwise error | [abs=108.3] OR [rel=2.1755] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=0.061848, std-dev=0.67083, var=0.45002, median=0.00027022, min=0 at (0, 222, 4), max=108.3 at (0, 24771, 2), avg-magnitude=0.061848
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0 , 10.8) | 2140748 | ########################################
(10.8, 21.7) | 905 |
(21.7, 32.5) | 227 |
(32.5, 43.3) | 59 |
(43.3, 54.2) | 39 |
(54.2, 65 ) | 15 |
(65 , 75.8) | 3 |
(75.8, 86.6) | 2 |
(86.6, 97.5) | 1 |
(97.5, 108 ) | 1 |
[I] Relative Difference | Stats: mean=0.11164, std-dev=0.10477, var=0.010977, median=0.084045, min=0 at (0, 222, 4), max=2.1755 at (0, 21445, 23), avg-magnitude=0.11164
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0 , 0.218) | 1858320 | ########################################
(0.218, 0.435) | 252097 | #####
(0.435, 0.653) | 26393 |
(0.653, 0.87 ) | 4058 |
(0.87 , 1.09 ) | 834 |
(1.09 , 1.31 ) | 208 |
(1.31 , 1.52 ) | 64 |
(1.52 , 1.74 ) | 16 |
(1.74 , 1.96 ) | 4 |
(1.96 , 2.18 ) | 6 |
[E] FAILED | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E] FAILED | Mismatched outputs: ['save_infer_model/scale_0.tmp_0']
[!] FAILED | Command: /usr/local/bin/polygraphy run yolov5s_quant.onnx --trt --int8 --onnxrt
@pranavm-nvidia @ttyio Do you have any suggestions here?
@yunyaoXYY , have you tried different calibration algorithm, tuning the QAT param? Also there is a sample sensitivity analysis code worth try in https://github.com/NVIDIA/NeMo/blob/main/examples/asr/quantization/speech_to_text_quant_infer.py#L71
Thanks!
@yunyaoXYY , have you tried different calibration algorithm, tuning the QAT param? Also there is a sample sensitivity analysis code worth try in https://github.com/NVIDIA/NeMo/blob/main/examples/asr/quantization/speech_to_text_quant_infer.py#L71
Thanks!
Hi, This problem has been solved a few months ago, thanks .
Description
Hi, I have a quantized Yolov5s ONNX model; When I use ONNX runtime to infer this model, I got the mAP of 36.8; But when I use C++ TRT backend, enable with INT8 inference, the mAP drops 10.9, I'm not sure what the problem is, could you please give some advice and check the model (attachment) ? Thanks!
Environment
TensorRT Version: 8.4.1.5 NVIDIA GPU: Tesla P40
NVIDIA Driver Version: 510.47.03 CUDA Version: 11.2 CUDNN Version: 8.1.1 Operating System: Linux Python Version (if applicable): 3.7 Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
model link : https://bj.bcebos.com/v1/paddle-slim-models/act/yolov5s_quant.onnx file : yolov5s_quant.onnx.zip
Steps To Reproduce
csrcs/fastdeploy/backends/tensorrt/trt_backend.cc(91)::CheckDynamicShapeConfig The loaded model's input tensor:x2paddle_images has shape [1, 3, 640, 640]. [08/10/2022-09:56:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +196, GPU +0, now: CPU 239, GPU 1274 (MiB) [08/10/2022-09:56:40] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +6, GPU +2, now: CPU 262, GPU 1276 (MiB) [08/10/2022-09:56:40] [W] [TRT] onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [INFO] csrcs/fastdeploy/backends/tensorrt/trt_backend.cc(430)::CreateTrtEngine Start to building TensorRT Engine... [08/10/2022-09:56:55] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +184, GPU +76, now: CPU 506, GPU 1352 (MiB) [08/10/2022-09:56:55] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +128, GPU +80, now: CPU 634, GPU 1432 (MiB) [08/10/2022-09:56:55] [W] [TRT] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.1.1 [08/10/2022-09:56:55] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [08/10/2022-09:58:20] [I] [TRT] Detected 1 inputs and 4 output network tensors. [08/10/2022-09:58:20] [I] [TRT] Total Host Persistent Memory: 145120 [08/10/2022-09:58:20] [I] [TRT] Total Device Persistent Memory: 2082816 [08/10/2022-09:58:20] [I] [TRT] Total Scratch Memory: 0 [08/10/2022-09:58:20] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 34 MiB, GPU 213 MiB [08/10/2022-09:58:20] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 87.029ms to assign 11 blocks to 181 nodes requiring 24294912 bytes. [08/10/2022-09:58:20] [I] [TRT] Total Activation Memory: 24294912 [08/10/2022-09:58:20] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +7, GPU +9, now: CPU 7, GPU 9 (MiB) [08/10/2022-09:58:20] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [08/10/2022-09:58:21] [W] [TRT] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1. [08/10/2022-09:58:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 616, GPU 1414 (MiB) [08/10/2022-09:58:21] [I] [TRT] Loaded engine size: 8 MiB [08/10/2022-09:58:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +9, now: CPU 0, GPU 9 (MiB) [INFO] csrcs/fastdeploy/backends/tensorrt/trt_backend.cc(496)::CreateTrtEngine TensorRT Engine is built succussfully. [08/10/2022-09:58:21] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +25, now: CPU 0, GPU 34 (MiB) loading annotations into memory... Done (t=0.62s) creating index... index created! 2022-08-10 09:58:21 [INFO] Starting to read file list from dataset... 2022-08-10 09:58:22 [INFO] ...
Then it is the log to record the mAP. mAP is very low compared to the result from ONNX backend.