Question about QAT module

Cat11438 commented 3 years ago

Hello, In the qat module, I'm a little bit confusing about the way to define a quantized convolution layer with tensor quantizer. From my observation, I see that only weight quantizer is initialized and involved in the inference process, however the converter for conv layer uses the amax learned in weight quantizer to set the dynamic range for the output of conv layer. Would that be better if we put another tensor quantizer dedicated to learn amax of layer's output? https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/contrib/qat/layers/quant_conv.py

SrivastavaKshitij commented 3 years ago

Hi there

QDQ (quantization and dequantization) node is hidden in TRT 7. For conv layer (in TRT 7), we have to give the dynamic range for weights as TRT takes that range and quantizes the weights for a conv trt layer and not its output. The reason TRT is not expected to quantize the output of conv layer as it causes problems with fusion of ops.

For e.g.

Conv--> Add..

Normally TRT will fuse both the ops.

If we quantize the weights of Conv and output of add (which is the ideal case), it should look something like this

Weights (quantized) --> Conv - Add (fused op) --> op(quantized)

However, if we quantize the output of conv

Weights (quantized) --> Conv --> o/p(quantized)-->Add-->o/p(quantized)

Here TRT cannot fuse conv and add op which can lead to degradation at runtime.

TRT7 didnt expose QDQ node properly , that's why it is a little confusing. Ideally, for TRT7, we would always want to give the dynamic range for weights because TRT7 is expected to quantize the weights for conv trt layer and not its output.

Cat11438 commented 3 years ago

Ok, therefore, in TRT7, when we set dynamic range for conv_layer.get_output(0), we are actually setting the dynamic range for weights rather than the output of conv layer, right?

SrivastavaKshitij commented 3 years ago

Yes

NVIDIA-AI-IOT / torch2trt

Question about QAT module #605