NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.84k stars 2.14k forks source link

Acc dropped when I convert onnx with Q/QD to int8 engine and test mAP in val dataset #2586

Closed Levi-zhan closed 1 year ago

Levi-zhan commented 1 year ago

Description

First, I use the training data set to train a model, and the mAP of the verification set is 0.65. Then, refer to the tutorial in the Tensorrt document, and replace the conv in the model with quant_ nn.QuantConv2d。 Then, execute PTQ。and then fine tune the PTQ model in training set (50 epochs) to get the QAT model that mAP is 0.64. Then convert it to onnx, and then convert it to int8 engine. However, the accuracy in the verification set is only 0.57. Do you have any optimization or debugging suggestions?

In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100

Environment

the docker is build refer to https://github.com/NVIDIA/TensorRT/tree/8.2.1

TensorRT Version: 8.2.1 NVIDIA GPU: NVIDIA GeForce GTX 1660 SUPER NVIDIA Driver Version: 470.129.06 CUDA Version: 11.4 CUDNN Version: 8.6 Operating System: Ubuntu 20.4 Python Version (if applicable): 3.7 Tensorflow Version (if applicable): PyTorch Version (if applicable): 12.1 Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

zerollzeng commented 1 year ago

Do you have any optimization or debugging suggestions?

Can you try the latest TRT release first? @ttyio may know more about it.

In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100

the accuracy might be different but the gap should be very small due to difference kernel use in different hardware.

Levi-zhan commented 1 year ago

Do you have any optimization or debugging suggestions?

Can you try the latest TRT release first? @ttyio may know more about it.

In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100

the accuracy might be different but the gap should be very small due to difference kernel use in different hardware.

Due to the limitation of my driver version, I can use cuda 11.4 at most. But I think the latest TRT release needs cuda 11.6. Maybe I made a mistake. I want to confirm to you whether the TensorRT OSS build container supports cuda 11.4

Levi-zhan commented 1 year ago

Do you have any optimization or debugging suggestions?

Can you try the latest TRT release first? @ttyio may know more about it.

In addition, will I get different results of mAP if I change the hardware? I also have RTX 3060 and Tesla V100

the accuracy might be different but the gap should be very small due to difference kernel use in different hardware.

I made a mistake. The input mAP of int8 engine is only 0.47. By not quantifying ConvTranspose(Use nn. ConvTranspose instead of quant_ nn.QuantConvTranspose) the accuracy is improved to 0.59. Is there any other method that can help me find other precision sensitive layers?

ttyio commented 1 year ago

@Levi-zhan , here is sample code for sensitive analysis: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/quantization/speech_to_text_quant_infer.py#L71

ttyio commented 1 year ago

closing since no activity for more than 3 weeks, thank you!