Closed arman-kazemi closed 6 months ago
The per-row quantization you are referring to is the LDLQ algorithm. LDLQ iteratively quantizes rows using linear feedback from the quantization error of already quantiozed rows. Within each row, groups of 8 weights are quantized using vector quantization to the E8P codebook.
Hi,
I understand that currently you are quantizing the model weights in a per-row fashion. Can you extend QuIP# to per-group granularity? Can you elaborate on why and why not?
Thanks