Closed deciding closed 1 year ago
Hi, since the order of updates is static (we do not change it dynamically as we sweep through the columns), we can actually compute everything from the inverse Hessian sequence in advance using a Cholesky decomposition (+ a small transformation to the update formals). This eliminates any of the Hessian updates during the algorithm and improves numerical stability and efficiency substantially. In our more recent SparseGPT paper in Figure 4 you can see in dark yellow the information we need from all the inverse Hessians, which is actually contained in the Cholesky decomposition (perhaps this helps for following the "Step 3: Cholesky Reformulation" section in the GPTQ paper).
Thanks a lot for the quick reply. The idea becomes clearer for me now. It is brilliant. 👍
It is written in the paper that " Indeed, the row removal via (3) for our symmetric H−1 essentially corresponds to taking a Cholesky decomposition"
can anyone explain why is that so?
I have the same question @waveajay
In gptq.py
fasterquant
function, there seems no any update to Hinv during the quantization process. Can I know the intuition behand this? I kinda lost in the paper that the introduction of cholesky decomposition can eliminate the update of Hinv.