Confused about the description - QuIP# is also the first PTQ method where 3-bit models scale better than 4-bit models

Cornell-RelaxML / quip-sharp

GNU General Public License v3.0

449 stars 36 forks source link

Confused about the description - QuIP# is also the first PTQ method where 3-bit models scale better than 4-bit models #57

Closed ChuanhongLi closed 4 days ago

ChuanhongLi commented 2 weeks ago

Thanks for the excellent work!

The paper says "QuIP# is also the first PTQ method where 3-bit models scale better than 4-bit models.", but the experiment results shown in the Table 2,3 and 4 are not consistent with it, which makes me a little confuced.

Can you help me to understand this?

Thanks!

tsengalb99 commented 2 weeks ago

By scaling, we mean perplexity vs total number of bits. The figure on the first page shows that 3 bit QuIP# has a lower perplexity for a fixed number of bits than a FP16 model plotted at 4 bits (i.e. a lossless 4 bit model that afaik doesn't actually exist). That's why we can make the claim that QuIP# 3 bit scales better than any possible 4 bit method, because even if there was a lossless 4 bit quantizer, it would still scale worse than QuIP# 3 bit. Hope this clears things up.