axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.48k stars 808 forks source link

New Optimizer: Implement Adam-Mini optimizer #1720

Open SicariusSicariiStuff opened 2 months ago

SicariusSicariiStuff commented 2 months ago

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

Paper: https://arxiv.org/abs/2406.16793

TL;DR Adam-mini should make it easier and faster to train models on home hardware. In theory, it shouldn't be overly complicated to implement it, as it is very similar to AdamW

✔️ Solution

Implement Adam-Mini in Axolotl.

❓ Alternatives

Keep using AdamW

📝 Additional Context

Adam-mini should probably be 'sort-of' compatible with DeepSpeed right out of the box, greatly increasing training speed and reducing memory footprint.

Acknowledgements

mhenrichsen commented 1 month ago

Thanks for the suggestion. I believe most of the optimizers come from transformers directly, so it might be worth implementing it there instead.