bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.26k stars 627 forks source link

out kwarg in matmul_4bit() is not working #1235

Open chenqianfzh opened 5 months ago

chenqianfzh commented 5 months ago

System Info

Linux 20.04

Reproduction

output is defined as a tensor. The following works as expected: output = matmul_4bit(a, b)

but the following does not, the elements in output are not changed. matmul_4bit(a, b, out=output)

Expected behavior

matmul_4bit(a, b, out=output) is expected to have the output values set. It is a preferred way as it saves extra memory copy.

matthewdouglas commented 5 months ago

Hi @chenqianfzh,

Thanks for reporting this issue. I believe that I see the same behavior with the 8bit version matmul as well.

For the 4bit version, this may work as expected when taking the gemv path, i.e. inputs where a is a 1D tensor with size divisible by selected blocksize.

chenqianfzh commented 5 months ago

Thanks for checking it out.

I am using this function in model reference, so a needs to high-dimensional tensors. :-(

Hi @chenqianfzh,

Thanks for reporting this issue. I believe that I see the same behavior with the 8bit version matmul as well.

For the 4bit version, this may work as expected when taking the gemv path, i.e. inputs where a is a 1D tensor with size divisible by selected blocksize.

vrdn-23 commented 4 months ago

Is there a fix planned for this anytime soon @matthewdouglas?