IST-DASLab / gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
https://arxiv.org/abs/2210.17323
Apache License 2.0
1.81k stars 145 forks source link

Compatibility of Quant3Linear and 4-bit quantization #48

Open mynotwo opened 7 months ago

mynotwo commented 7 months ago

Hi! I've noticed that the quantization layer would pack the quantized weight using class Quant3Linear, as shown below: image

However, it seems to me that it only suits for 2bits and 3bits weights. If the original weights in $intweight is 4bits, some bits would be lost.

Could you explain the logic behind this? Thanks!