bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.14k stars 616 forks source link

Cuda source cleanup , refactor and fixes #1328

Closed abhilash1910 closed 1 month ago

abhilash1910 commented 1 month ago

This is a draft Pr for cleanup of cuda kernels, refactor template assignments and fix some AOT issues. The cuda kernels have some scope of cleanup in terms of redundant code pathways and unused methods. cc @matthewdouglas , @Titus-von-Koeller @TimDettmers (pinging for awareness).

matthewdouglas commented 1 month ago

Thanks! We actually just recently were discussing some cleanup here so this fits nicely timing-wise!

Titus-von-Koeller commented 1 month ago

@abhilash1910

Thanks for the PR, really appreciate the pro-activity in helping us clean things up!

I just had a chat with Tim and he reviewed the PR and said everything looks good to him.

I see you put the PR as draft, are there any more changes coming or can wrap things up?

abhilash1910 commented 1 month ago

Hi @Titus-von-Koeller , I am planning to wrap this up in a couple of hours from now. Will ping you . Thanks

abhilash1910 commented 1 month ago

@Titus-von-Koeller this is ready for review now, thanks. Some changes since last review : quantize 2d is a generic method over unused quantize quadrant, some rearrangements of template classes, removal of unused functions