NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.54k stars 2.1k forks source link

What's the default quantization mode for TensorRT PTQ. #1421

Closed un-knight closed 2 years ago

un-knight commented 3 years ago

According to TensorRT's document, TensorRT only supports symmetric and uniform type quantization, which means quantization zero-point should always be 0.

But when I set the dynamic range(e.g. (0, 5.6845)) for network layers manually, I find TensorRT calculates a scale and a non-zero zero-point through the verbose logs. So does TensorRT support non-symmetric uniform type quantization which is in conflict with the document?

And are the weights quantized per channel by default in PTQ? Can the user configure it to be per tensor?

ttyio commented 3 years ago

@un-knight , no we only support symmetric and uniform so far. could you paste the log line here?

Yes the weights is per channel and cannot configure by user for PTQ.

ttyio commented 2 years ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!