Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
6.85k stars 726 forks source link

Add support for memory-efficient and faster optimizers #1364

Open rasbt opened 3 weeks ago

rasbt commented 3 weeks ago

Maybe GaLore (#1192) should be changed from GaloreArgs to OptimizerArgs after all. Then we can also more easily consider other variants such as BAdam (BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, https://arxiv.org/abs/2404.02827).

The experiments from here look very compelling. And it only adds 1 hyperparameter:

Screenshot 2024-04-27 at 8 36 56 AM
lantiga commented 2 weeks ago

Agreed