Improving int8 quantization results.

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.84k stars 2.14k forks source link

Improving int8 quantization results. #3865

Open severecoder opened 6 months ago

severecoder commented 6 months ago

I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

The end result is to have .trt or engine file inferencing at int8 precision with best possible detection metrics.

TIA

zerollzeng commented 6 months ago

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

pytorch_quantizaton will be deprecated, please use AMMO now.

severecoder commented 6 months ago

Thank for the response, isn't ammo only limited to LLMs?

brb-nv commented 6 months ago

There's also support for diffuser models. [link]

Btw, AMMO has been renamed to TensorRT Model Optimizer. [reference]