IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.81k stars 145 forks source link

Why no update to Hinv #21

Closed deciding closed 1 year ago

deciding commented 1 year ago

In gptq.py fasterquant function, there seems no any update to Hinv during the quantization process. Can I know the intuition behand this? I kinda lost in the paper that the introduction of cholesky decomposition can eliminate the update of Hinv.

efrantar commented 1 year ago

Hi, since the order of updates is static (we do not change it dynamically as we sweep through the columns), we can actually compute everything from the inverse Hessian sequence in advance using a Cholesky decomposition (+ a small transformation to the update formals). This eliminates any of the Hessian updates during the algorithm and improves numerical stability and efficiency substantially. In our more recent SparseGPT paper in Figure 4 you can see in dark yellow the information we need from all the inverse Hessians, which is actually contained in the Cholesky decomposition (perhaps this helps for following the "Step 3: Cholesky Reformulation" section in the GPTQ paper).

deciding commented 1 year ago

Thanks a lot for the quick reply. The idea becomes clearer for me now. It is brilliant. 👍

waveajay commented 1 year ago

It is written in the paper that " Indeed, the row removal via (3) for our symmetric H−1 essentially corresponds to taking a Cholesky decomposition"

can anyone explain why is that so?

ROUJINN commented 3 weeks ago

I have the same question @waveajay