Shampoo Optimizer - Githubissues

HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training

BSD 2-Clause "Simplified" License

46 stars 6 forks source link

Second-order optimizers such as K-Fac, LBFGS and AdaHessian promise significantly improved convergence rates at horrific memory costs. Scalable Shampoo promises a low memory footprint and vectorisable computation while retaining the convergence advantage of other second-order optimisers. Adding it to our code could reduce training time by 10% or even up to an order of magnitude.\ This issue is about implementing shampoo (reference might help), running a hyperparameter sweep to find its best configuration and comparing the best possible runtime with our previous best.

HomebrewNLP / Olmax

Shampoo Optimizer #15