bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.14k stars 616 forks source link

Add AdEMAMix optimizer #1360

Closed matthewdouglas closed 2 weeks ago

matthewdouglas commented 3 weeks ago

Adds support for the AdEMAMix optimizer described here: https://arxiv.org/abs/2409.03137

Includes blockwise 8bit and 32bit versions, each supporting paged operation.

AdEMAMix is a modification to Adam which introduces an additional EMA component. It is observed that AdEMAMix can forget training data at a slower pace and can reach similar loss as AdamW with significantly less training data.

TODO: Implement scheduler for alpha/beta3

github-actions[bot] commented 2 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.