This package contains the official PyTorch implementation of our inverse- and square-root free Shampoo optimizer from our ICML paper 'Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective' (the 'IF-Shampoo' optimizer in Fig. 3).
Some highlights of the optimizer:
bfloat16
, due to a fully matrix-multiplication
based update (no matrix decompositions)Stable (recommended):
pip install sirfshampoo
Latest version from GitHub main
branch:
pip install git+https://github.com/f-dangel/sirfshampo.git@main
SIRFShampoo
assumes that the objective is an average over per-example losses.
The code has stabilized only recently. Expect things to break and help us improve by filing issues.
If you find this code useful for your research, consider citing the paper:
@inproceedings{lin2024can,
title = {Can We Remove the Square-Root in Adaptive Gradient Methods? A
Second-Order Perspective},
author = {Wu Lin and Felix Dangel and Runa Eschenhagen and Juhan Bae and
Richard E. Turner and Alireza Makhzani},
booktitle = {International Conference on Machine Learning (ICML)},
year = 2024,
}