bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.25k stars 626 forks source link

question regarding. different quantization code blocks #93

Closed patelprateek closed 10 months ago

patelprateek commented 1 year ago

Hi , i was going though the code base here and have few questions

https://github.com/TimDettmers/bitsandbytes/blob/6bc2b992be0bb7511ea881f8ebbbd2ba7f1b5109/bitsandbytes/functional.py#L1833 : vectorwise_quant : implements different quantization types

1) are row and vector same type of quantization ? what is the difference ? from code it seems same

2) vectorwise_dequant : this is only implemented for "vector" , and returns None for all other quantization ? Is this correct or just not implemented ?

3) vectorwise_mm_dequant : i see some params like S1, S2 but nt sure what they mean and how it differs from vectorwise_quant , any guidance here will be helpful

4) in test_modules i observe quant which is different than vectorwise_quant , any reason ? i see the test code implements min-max but the vectorwise_quant doesnt implement . Similarly dequant in test_module takes parameters S1 and S2 which i dont really know how it relates to quant. Usually i would expect something produced by quantization method can be dequantized , similar to encode decode or serialize deserialize apis .

5) double_quant : what exactly are row_stats and col_stats here and similarly return out_row, out_col, row_stats, col_stats, coo_tensor , could you please elaborate on the outputs here ?

6) in Int8Params class i observe CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B) , could you please help in elaborating a bit on what CB and SCB param mean

Thanks

TimDettmers commented 1 year ago

Thanks, these are great questions!

  1. See the other issue #92
  2. vectorwise_dequant does the dequantization where A is row-wise normalized and B is tensor-wise normalized. I think this function is obsolete and should be replaced by vectorwise_mm_dequant for the case row
  3. vectorwise_mm_dequant does the dequantization after the matmul for row-wise A and column-wise B
  4. Sorry, this is a bit of a mess. The python function are only used for fake quantization and are there mostly for documentation. It might be that there are some duplications here and there. As for S1/S2 this refers to the normalization statistics for A*B (A=S1, B=S2).
  5. If you normalize a matrix you can normalize by rows or by columns. This function does both at the same time to be more efficient. Both of these values are needed in case of training since matrices are often transposed during the backward pass. The row/col_stats variable are the quantization statistics for if using row/column-wise normalization
  6. CB is the row-major Int8 matrix. SCB holds the state of the CB tensor, which are normalization/quantization statistics.

Let me know if you have any more questions!

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.