Momentum Quantization - Githubissues

Many modern optimisers, such as Shampoo, SM3 and 8-Bit Adam quantise the large momentum buffers to a lower precision such as int8. This quantisation gives them significant memory improvements, as they now only need 6 bytes per parameter instead of 12. We could save 16% to 33% of our total memory consumption by adding momentum quantisation, allowing for more parameters and bigger batches.\ This issue is about implementing quantised momentum and benchmarking its convergence impact compared to bf16 and fp32 momentum.

HomebrewNLP / Olmax

Momentum Quantization #14