KwangHoonAn / PACT

Reproducing Quantization paper PACT
Apache License 2.0
55 stars 12 forks source link

The activaton quantization of the First layer #2

Open louvinci opened 3 years ago

louvinci commented 3 years ago

Hello, thank you for your share. I have a question when I view the code. I think each layer's weights are quantized to corresponding bit widths. But, in terms of activation quantization, it seems the input tensor of the first layer still remains 32-bit float rather than the 8-bit. The "ActFn" is applied just in the output tensor of the first layer, e.t. the input of the second layer. Is that so?

KwangHoonAn commented 3 years ago

Hi, image input inherently itself 8 bitwidth If I'm not wrong? - 0~255 values, after normalization each channel will have different unique 256 values though But, yes, your statement is correct, at the moment I was not fully understood about the paper and was not 100% clear to me how quantizers are placed between layers. We should add extra quantizers to satisfy your statement - running whole networks in fully low bitwidth simulated

louvinci commented 3 years ago

Thank you for your reply. I get it. The input tensor of the first layer can be naturally expressed by 8-bit data-width, even after through the "ToTensor" and "Normalize" operation. About the question you mentioned, I think the "quant" follow by "dequant" operation, which reduces the scope of the data, has already simulated the low bit-width training and inference. Actually, the method is called 'fake' quantization, the data type is still floating data type.

KwangHoonAn commented 3 years ago

Hi Louvinci, Yes, most of quantization paper we are 'simulating' by adding quantization noise through quantize-dequantize step. I'm not so sure if there is a paper actually running end to end matmul in INT8 not FP32.

One of my repo that implemented data free quantization contains what you mentioned, you can refer below link https://github.com/KwangHoonAn/Quantizations/blob/main/quantops.py#L94-L116