Does TensorRT support QAT & PTQ INT8 quantization of clip/vit models?

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.51k stars 2.1k forks source link

Does TensorRT support QAT & PTQ INT8 quantization of clip/vit models? #3417

Open shhn1 opened 10 months ago

shhn1 commented 10 months ago

Does TensorRT support QAT&PTQ INT8 quantization of clip/vit models? Could you please provide any relevant quantization examples and accuracy & latency benchmark?

zerollzeng commented 10 months ago

@nvpohanh @ttyio ^ ^

nvpohanh commented 10 months ago

INT8 with Q/DQ ops should work for ViT.

shhn1 commented 10 months ago

INT8 with Q/DQ ops should work for ViT.

Does int8 Q/DQ ops also support clip model? Is there any relevant documentation that I can refer to? Thanks : )