NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.13k stars 2.07k forks source link

Improving int8 quantization results. #3865

Open severecoder opened 1 month ago

severecoder commented 1 month ago

I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

The end result is to have .trt or engine file inferencing at int8 precision with best possible detection metrics.

TIA

zerollzeng commented 1 month ago

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

pytorch_quantizaton will be deprecated, please use AMMO now.

severecoder commented 1 month ago

Thank for the response, isn't ammo only limited to LLMs?

brb-nv commented 1 month ago

There's also support for diffuser models. [link]

Btw, AMMO has been renamed to TensorRT Model Optimizer. [reference]