NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
528 stars 38 forks source link

Will onnx_ptq support smoothquant in future? #58

Open tp-nan opened 2 months ago

tp-nan commented 2 months ago

Hi, guys, Im wondering would smoothquant be supported in the future for int8 onnx quant? Mainly for VIT-like model and LLM.

riyadshairi979 commented 2 months ago

Yes we plan to include it on modelopt 0.19 (sometime in Oct 2024).