bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.35k stars 637 forks source link

[RFC] Cross-Platform Refactor: CPU-only implementation #1021

Open rickardp opened 10 months ago

rickardp commented 10 months ago

Motivation

As we want to have this library portable, the first step would be to make 100% of this library run correctly on only CPU (i.e. not requiring CUDA for any part of the functionality). This would serve two purposes:

Proposed solution

Open questions

@Titus-von-Koeller Feel free to edit this issue as you see fit, if you want a different structure for it for example.tbd

tbd

simepy commented 2 months ago

@rickardp Where are we on this feature ? It is some part already working, or another threads talking about this feature ?, not much comment here.

I'm especially interested about arm64 CPU only

rickardp commented 2 months ago

@rickardp Where are we on this feature ? It is some part already working, or another threads talking about this feature ?, not much comment here.

Hi @simepy, sorry not much to add here still. I am still up for contributing towards this when 1) I have time to do so and 2) the dependencies that I do not have time to contribute are ready to use. More specifically the idea is to take a gradual approach and use the reference implementation where MPS acceleration is not yet implemented. Currently, large parts of this codebase require CUDA, which does not run on Apple silicon, making a partial implementation virtually unusable.