IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.92k stars 153 forks source link

Regarding the method for computing the Hessian matrix. #51

Open baiSongL opened 10 months ago

baiSongL commented 10 months ago

I would like to ask about line 61 in your gptq.py file: inp = math.sqrt(2 / self.nsamples) * inp.float(). According to the paper, it seems that it should be written as follows: inp = math.sqrt(tmp / self.nsamples) * inp.float(). After making this modification, I noticed a reduction in quantization error. Could you please verify if my understanding is correct, and if there might be any misunderstanding on my part?

efrantar commented 10 months ago

Hi, this part of the code accumulates the average Hessian iteratively; whether there is a 2 or not depends on the definition of the cost function (if it is 1/2 * squared error or just squared error) and, similarly, whether there is an average or not. Neither of this has any affect on the resulting quantized weights (constant factors cancel out during the algorithm), it just changes the displayed per-layer error value.