Open louvinci opened 3 years ago
Hi, image input inherently itself 8 bitwidth If I'm not wrong? - 0~255 values, after normalization each channel will have different unique 256 values though But, yes, your statement is correct, at the moment I was not fully understood about the paper and was not 100% clear to me how quantizers are placed between layers. We should add extra quantizers to satisfy your statement - running whole networks in fully low bitwidth simulated
Thank you for your reply. I get it. The input tensor of the first layer can be naturally expressed by 8-bit data-width, even after through the "ToTensor" and "Normalize" operation. About the question you mentioned, I think the "quant" follow by "dequant" operation, which reduces the scope of the data, has already simulated the low bit-width training and inference. Actually, the method is called 'fake' quantization, the data type is still floating data type.
Hi Louvinci, Yes, most of quantization paper we are 'simulating' by adding quantization noise through quantize-dequantize step. I'm not so sure if there is a paper actually running end to end matmul in INT8 not FP32.
One of my repo that implemented data free quantization contains what you mentioned, you can refer below link https://github.com/KwangHoonAn/Quantizations/blob/main/quantops.py#L94-L116
Hello, thank you for your share. I have a question when I view the code. I think each layer's weights are quantized to corresponding bit widths. But, in terms of activation quantization, it seems the input tensor of the first layer still remains 32-bit float rather than the 8-bit. The "ActFn" is applied just in the output tensor of the first layer, e.t. the input of the second layer. Is that so?