IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.81k stars 145 forks source link

Question about the difference between the pseudocode and the implementation #27

Closed RachelXu7 closed 1 year ago

RachelXu7 commented 1 year ago
image image

The Hessian inverse information in your pseudocode is computed by cholesky of H's inverse. In code, you use the cholesky first and then cholesky inverse and then cholesky again. I am not sure the reason of the difference. And is the cholesky_inverse kernel necessary here?Can I just compute the H's inverse and then use cholesky?

Thank you so much.

efrantar commented 1 year ago

Hi! Since the Hessian is symmetric we can calculate its inverse slightly faster and in a more stable way (while also guaranteeing that the output is symmetric) using cholesky() + cholesky_inverse() (the latter expects as input a Cholesky decomposition). This means the first two lines are just calculating $H^{-1}$ utilizing that $H$ is symmetric.