bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
5.97k stars 604 forks source link

Prodigy Optimizer #955

Open KohakuBlueleaf opened 8 months ago

KohakuBlueleaf commented 8 months ago

Feature request

Want to know if it is possible to implement Prodigy optimizer into bnb with 8bit support.

Motivation

Prodigy is now widely used in FT since it is more user friendly especially for not-expert users. Who are also most not likely to have lot of knowledge about NN and good GPUS. And since Prodigy have 4 full size state, means it will consume a lot of vram. I think it is crucial to implement some kind of space optimization for prodigy. And I think bnb's 8bit optimizers are good way for this goal.

Your contribution

None, sry I have read the source code and I think we will need to add a new "4state"(or even 5state) optimizer class. And since the thing need to be done is related to some CUDA things which is out of my ability.

If we can have some general template for optimizer which let us to just fill the logic inside of it. It may be ok for me to make PR for Prodigy or other optimizers.

DarkAlchy commented 5 months ago

Sure would be nice to get this as 8bit for sure as it is greatly needed.

betterftr commented 5 months ago

Yes please!

umarbutler commented 4 months ago

+1 I'd find this immensely useful seeing as Prodigy is already slower than AdamW.

Xynonners commented 4 weeks ago

+1