Closed ChuanhongLi closed 4 days ago
By scaling, we mean perplexity vs total number of bits. The figure on the first page shows that 3 bit QuIP# has a lower perplexity for a fixed number of bits than a FP16 model plotted at 4 bits (i.e. a lossless 4 bit model that afaik doesn't actually exist). That's why we can make the claim that QuIP# 3 bit scales better than any possible 4 bit method, because even if there was a lossless 4 bit quantizer, it would still scale worse than QuIP# 3 bit. Hope this clears things up.
Thanks for the excellent work!
The paper says "QuIP# is also the first PTQ method where 3-bit models scale better than 4-bit models.", but the experiment results shown in the Table 2,3 and 4 are not consistent with it, which makes me a little confuced.
Can you help me to understand this?
Thanks!