IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.89k stars 151 forks source link

Does GPTQ reduce to OBQ if I set block size to 1? #25

Closed zxxmxd closed 1 year ago

efrantar commented 1 year ago

The blocksize has no effect on the function of the GPTQ algorithm, it merely effects the efficiency of execution on a GPU by batching together updates (though there may be some numerical artifacts in practice). In general, the relation of GPTQ to OBQ is that it uses the same update formulas but applies quantization in the same fixed order across all matrix rows, whereas OBQ separately quantizes each row in the order of (dynamically determined) quantization difficulty, which is generally different between rows. This makes GPTQ dramatically more efficient that OBQ and allows scaling to extremely large models.

zxxmxd commented 1 year ago

I see, thank you for your clarification.