NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.76k stars 2.13k forks source link

pytorch_quantization qat accuracy descend #4093

Closed steven-spec closed 1 month ago

steven-spec commented 2 months ago

Description

the model after pytorch_quantization qat, accuracy descend relative to before pytorch_quantization qat

Environment

TensorRT Version: 8.5.3.1

NVIDIA GPU: TITIAN Xp

NVIDIA Driver Version:450.80.02

CUDA Version:11.0

CUDNN Version:8.6.0

Operating System:Ubuntu 18.04.5 LTS

Python Version (if applicable):3.9.16

Tensorflow Version (if applicable):

PyTorch Version (if applicable):1.12.1+cu102

Baremetal or Container (if so, version):

Relevant Files

Model link: code.zip

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

lix19937 commented 2 months ago

You only do caib, if calib is not meet the expectations, you need finetune.

steven-spec commented 2 months ago

calib is not meet the expectations, so i do QAT train my QAT code as the file below: qat code.zip

Am I writing this correctly, or i should calib and the pth, then load pth and do QAT train, Divide the operation into two steps instead of one

akhilg-nv commented 2 months ago

Is this a bug in Tensorrt, or is the accuracy poor after QAT even with torch inference? You can try using ModelOpt package for calibration if it fits your use case.

moraxu commented 1 month ago

@steven-spec as per our policy, I am going to close this issue as it's older than 21 days. If you'd like to follow up, please open another issue, thank you.