NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.84k stars 2.14k forks source link

The qat int8 model works fine when batch is 1, but the accuracy of multi-batch inference drops a lot closer to 0 #2382

Closed munhou closed 1 year ago

munhou commented 2 years ago

Description

I use pytorch-quantization train a yolov5 model,Then convert to tensorrt engine,the model works fine when batch is 1, but the accuracy of multi-batch inference drops a lot closer to 0。

And when I don't use qat to use ptq quantization, multi-batch works fine!

Environment

NGC:nvcr.io/nvidia/tensorrt:21.08-py3

TensorRT Version: TensorRT 8.0.1.6 NVIDIA GPU: 1070 NVIDIA Driver Version: 470.63.01 CUDA Version: 11.4 CUDNN Version: 8.2.2.26 Operating System: ubuntu20.4 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

qat_model

Steps To Reproduce

I use the command below to analyze the output

polygraphy run qat_model.onnx --int8 --trt --onnxrt --workspace 100000000 --save-engine=tmp.plan --trt-min-shape 'images:[1,3,1
28,128]' --trt-opt-shape 'images:[4,3,512,512]' --trt-max-shape 'images:[8,3,768,768]' --input-shapes images:[4,3,512,512]
polygraphy run qat_model.onnx --int8 --trt --onnxrt --workspace 100000000 --save-engine=tmp.plan --trt-min-shape 'images:[1,3,1
28,128]' --trt-opt-shape 'images:[4,3,512,512]' --trt-max-shape 'images:[8,3,768,768]' --input-shapes images:[1,3,512,512]

The output using 4 batches are difference greatly between onnx and tensorrt

And I tried the following steps, I found where the difference began, but that layer looked so common that I couldn't solve it

polygraphy surgeon sanitize qat_model.onnx -o folded.onnx --fold-constants --override-input-shapes images:[4,3,512,512]
polygraphy run folded.onnx --onnxrt     --save-inputs inputs.json     --onnx-outputs mark all --save-outputs layerwise_golden.json
polygraphy data to-input inputs.json layerwise_golden.json -o layerwise_inputs.json
polygraphy debug reduce folded.onnx -o initial_reduced.onnx --mode=bisect --load-inputs layerwise_inputs.json     --check polygraphy run polygraphy_debug.onnx --trt             --load-inputs layerwise_inputs.json --load-outputs layerwise_golden.json --int8 --atol 1 --rtol 1
zerollzeng commented 2 years ago

@ttyio ^ ^

munhou commented 2 years ago

@ttyio Do you know what could be the problem and what should I do?

ttyio commented 2 years ago

@munhou , have you tried newer TRT, like 8.5 EA in https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel-22-10.html#rel-22-10 ? thanks!

ttyio commented 1 year ago

closing due to no response for more than 3 weeks, please reopen if you still have question, thanks!