NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
576 stars 43 forks source link

the supporting models in modelopt.torch.quantization #42

Closed XA23i closed 4 months ago

XA23i commented 4 months ago

Hi, I see that we can quantize our model by modelopt.torch.quantization.quantize(model, ...). I am wondering what are the supporting models, does any pytorch model make sense?

Edwardf0t1 commented 4 months ago

Thanks for your interest and question. Please check our supported model list here.

XA23i commented 4 months ago

Thank you for quick reply. By the way, what can i do to quantize models that are out of the supporting lists.

jingyu-ml commented 4 months ago

Yes, you can, as long as it is a standard PyTorch-based model. @XA23i

XA23i commented 4 months ago

That's great. I will have a try.

hchings commented 4 months ago

@XA23i Can you share what models you'd like to quantize but are not on the support list?