Question about the Quantization process

JordanChua commented 6 months ago

Hi Lorenzo thanks for the paper it was a good read! I'm actually trying to implement quantization pipeline on a trained model and I was hoping to refer to the compression pipeline you have implemented in this paper, mainly QAT followed by Quantization and Entropy Coding.

I was hoping to get some of your inputs in how I could achieve this! Thanks a lot!

aegroto commented 6 months ago

Hello, thanks for your interest! Most of the quantization code can be found in https://github.com/aegroto/nif/blob/master/compression/__init__.py. Basically, the original floating point values are normalized and then quantized to 8-bit integers in the range [-128, 127].

This quantization is used in QAT here https://github.com/aegroto/nif/blob/aac23fd1ecb175b8754c5180fe5897d7c97606ec/phases/qat.py#L34. This is a very naive implementation of QAT as the weights are substitute straight with their quantized counterparts. A simple yet effective way to obtain better results is to add quantization residual as noise before the linear pass, which is the approach I have adopted in more recent code, such as in the fairseq library: https://github.com/facebookresearch/fairseq/blob/34973a94d09ecc12092a5ecc8afece5e536b7692/fairseq/modules/quantization/scalar/modules/qlinear.py#L88

Entropy coding is done by applying brotli on the quantized tensors casted to be numpy int8 arrays: https://github.com/aegroto/nif/blob/aac23fd1ecb175b8754c5180fe5897d7c97606ec/compress.py#L37.

I hope those tips will help you with your research. Feel free to ask more if you have any doubt.

JordanChua commented 6 months ago

Thanks a lot for the help Lorenzo! As you have suggested I have managed to make the naive implementation of QAT working and I'm working on the approach u have suggested which takes into account the quantization noise.

aegroto / nif

Question about the Quantization process #1