Add AdEMAMix optimizer - Githubissues

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

https://huggingface.co/docs/bitsandbytes/main/en/index

MIT License

6.34k stars 636 forks source link

Add AdEMAMix optimizer #1360

Closed matthewdouglas closed 2 months ago

matthewdouglas commented 2 months ago

Adds support for the AdEMAMix optimizer described here: https://arxiv.org/abs/2409.03137

Includes blockwise 8bit and 32bit versions, each supporting paged operation.

AdEMAMix is a modification to Adam which introduces an additional EMA component. It is observed that AdEMAMix can forget training data at a slower pace and can reach similar loss as AdamW with significantly less training data.

~~TODO: Implement scheduler for alpha/beta3~~

github-actions[bot] commented 2 months ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.