Recommended Torch Quantization Library to Use -- Modelopt v.s. Pytorch-quantization

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Apache License 2.0

10.47k stars 2.1k forks source link

Description

I have seen two quantization librares built by Nvidia: a TRT modelopt and a pytorch-quantization. What are the differences between the two libraries?

My use case is to do PTQ (and potentially QAT) on a pytorch model, export it to onnx and then covert it with TRT. In this case, what library should I choose to use?

Thanks,

TRT modelopt include pytorch-quantization.

See https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_pytorch_quantization.html

ModelOpt PyTorch quantization is refactored based on pytorch_quantization.

Key advantages offered by ModelOpt’s PyTorch quantization:

1 Support advanced quantization formats, e.g., Block-wise Int4 and FP8.

2 Native support for LLM models in Hugging Face and NeMo.

3 Advanced Quantization algorithms, e.g., SmoothQuant, AWQ.

4 Deployment support to ONNX and NVIDIA TensorRT.

Both tools you can use.

https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#

NVIDIA / TensorRT

Recommended Torch Quantization Library to Use -- Modelopt v.s. Pytorch-quantization #3994

Description