Different quantization schemes https://arxiv.org/pdf/2208.07339.pdf

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

https://huggingface.co/docs/bitsandbytes/main/en/index

MIT License

6.32k stars 634 forks source link

Different quantization schemes https://arxiv.org/pdf/2208.07339.pdf #92

Closed patelprateek closed 11 months ago

patelprateek commented 2 years ago

Hi , Was going through some impressive work here you have done : https://arxiv.org/pdf/2208.07339.pdf

Few naive questions : In table 1 we compare different type of quantization, could you help elaborate a bit on the difference for example what is row vs vector quantization , i could not find any pointers to how does a row quantization differs from vector or the terminology ? Similarly Int8 absmax vs Int8 absmax row-wise ? does it mean in the former we take the absmax over the entire matrix/tensor where as in row-wise we take max for each row ? What about Int8 absmax vectorwise ?

Thanks again

TimDettmers commented 1 year ago

If you have a matrix with two dimensions, you can normalize it into the range [-1, 1] by taking the absolute maximum over either rows or columns. In matrix multiplication, A*B=C, you can do this for both matrices A and B.

row-wise quantization normalizes A for each row and B with only a single tensor-wise constant
vector-wise quantization normalizes A for each row and B for each column

Let me know if it is still unclear and I will try to explain it in different terms.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.