Meaning for the "Int8 absmax row-wise + decomposition" combination in paper ?

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

https://huggingface.co/docs/bitsandbytes/main/en/index

MIT License

6.24k stars 626 forks source link

Meaning for the "Int8 absmax row-wise + decomposition" combination in paper ? #43

Closed frankang closed 2 years ago

frankang commented 2 years ago

Hi, I have problem understanding the "Int8 absmax row-wise + decomposition" entry in Table 1. Does it mean "Absmax LLM.int8() (row-wise + decomp)" ? Because it does not contain the "LLM.int8()" keyword, I'm wondering if it refers to some combination else. Thanks!

TimDettmers commented 2 years ago

LLM.int8() does stand for the combination of vector-wise quantization and mixed precision decomposition. Since row-wise quantization is used, it is not a variant of LLM.int8().

Row-wise quantization only performs vector quantization of the input tensor (mini-batch/hidden state) while full vector-wise quantization also performs vector quantization on the weight matrix.

frankang commented 2 years ago

Thank you for the explanation.