f-dangel / sirfshampoo

[ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)
https://sirfshampoo.readthedocs.io
MIT License
12 stars 1 forks source link

SIRFShampoo: Structured Inverse- and Root-Free Shampoo

This package contains the official PyTorch implementation of our inverse- and square-root free Shampoo optimizer from our ICML paper 'Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective' (the 'IF-Shampoo' optimizer in Fig. 3).

Some highlights of the optimizer:

Installation

Usage

Limitations

Citation

If you find this code useful for your research, consider citing the paper:


@inproceedings{lin2024can,
  title =        {Can We Remove the Square-Root in Adaptive Gradient Methods? A
                  Second-Order Perspective},
  author =       {Wu Lin and Felix Dangel and Runa Eschenhagen and Juhan Bae and
                  Richard E. Turner and Alireza Makhzani},
  booktitle =    {International Conference on Machine Learning (ICML)},
  year =         2024,
}