Cornell-RelaxML / quip-sharp

GNU General Public License v3.0
479 stars 42 forks source link

Group-wise Quantization #44

Closed arman-kazemi closed 6 months ago

arman-kazemi commented 7 months ago

Hi,

I understand that currently you are quantizing the model weights in a per-row fashion. Can you extend QuIP# to per-group granularity? Can you elaborate on why and why not?

Thanks

tsengalb99 commented 7 months ago

The per-row quantization you are referring to is the LDLQ algorithm. LDLQ iteratively quantizes rows using linear feedback from the quantization error of already quantiozed rows. Within each row, groups of 8 weights are quantized using vector quantization to the E8P codebook.