What's the default quantization mode for TensorRT PTQ.

un-knight commented 3 years ago

According to TensorRT's document, TensorRT only supports symmetric and uniform type quantization, which means quantization zero-point should always be 0.

But when I set the dynamic range(e.g. (0, 5.6845)) for network layers manually, I find TensorRT calculates a scale and a non-zero zero-point through the verbose logs. So does TensorRT support non-symmetric uniform type quantization which is in conflict with the document?

And are the weights quantized per channel by default in PTQ? Can the user configure it to be per tensor?

ttyio commented 3 years ago

@un-knight , no we only support symmetric and uniform so far. could you paste the log line here?

Yes the weights is per channel and cannot configure by user for PTQ.

ttyio commented 2 years ago

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

NVIDIA / TensorRT

What's the default quantization mode for TensorRT PTQ. #1421