Closed frankang closed 2 years ago
LLM.int8()
does stand for the combination of vector-wise quantization and mixed precision decomposition. Since row-wise quantization is used, it is not a variant of LLM.int8()
.
Row-wise quantization only performs vector quantization of the input tensor (mini-batch/hidden state) while full vector-wise quantization also performs vector quantization on the weight matrix.
Thank you for the explanation.
Hi, I have problem understanding the "Int8 absmax row-wise + decomposition" entry in Table 1. Does it mean "Absmax LLM.int8() (row-wise + decomp)" ? Because it does not contain the "LLM.int8()" keyword, I'm wondering if it refers to some combination else. Thanks!