TimDettmers / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
5.74k stars 584 forks source link

Request for AdamW8bit support on CPU (would help TorchTune) #1226

Open sanchitintel opened 1 month ago

sanchitintel commented 1 month ago

Feature request

Port AdamW8bit support for CPU from multi-backend-refactor branch to the main branch

Motivation

Public cloud providers' machines with GPUs are usually expensive while datacenter-grade CPUs are more readily available at lower prices. Towards the goal of making Deep Learning more accessible to developers & learners, the ability to finetune with AdamW8bit on CPU seems like a good milestone. TorchTune is currently unable to support full fine-tuning on CPU with AdamW8bit because it uses bitsandbytes' AdamW8bit optimizer.

#898 enabled AdamW8bit for CPU in multi-backend-refactor branch, but the main branch doesn't have it.

It'd be great if we could enable AdamW8bit for CPU in bitsandbytes main branch before TorchTune's next release (provided there would be a bitsandbytes release before that), so that users who'd install TorchTune would automatically end up installing a version of bitsandbytes that'd support AdamW8bit on CPU.

Thanks!

Your contribution

@jianan-gu could port over his code from multi-backend-refactor branch to the main branch.

cc @mingfeima @ashokei @TimDettmers

sanchitintel commented 1 month ago

#1220 will fix this issue.

matthewdouglas commented 1 month ago

1220 will fix this issue.

I don't recall seeing any optimizers implemented yet for CPU, but may be mistaken.

Paged optimizer doesn't make sense to me for CPU, but I can understand the request for AdamW8bit.

sanchitintel commented 1 month ago

Thanks for pointing that out, @matthewdouglas! I've revised the description.

@jianan-gu @xia-weiwen, please clarify if you had added AdamW8bit implementation for CPU to bitsandbytes. If not, do you have plans to add it? Thanks!

Xia-Weiwen commented 1 month ago

@sanchitintel Yes, we are going to do it. cc. @jianan-gu @jiqing-feng

Titus-von-Koeller commented 1 month ago

@sanchitintel thanks for raising this. When is the next torchtune release foreseen?

Hmm, the problem is that the device abstraction / dispatcher situation is still not stable. Things will change fundamentally in the next 3 weeks. Not sure if this can be done as a PR to main in isolation? @Xia-Weiwen could you sketch out a bit more how you think this would make sense?