HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 5 forks source link

perf(optimizer/shampoo): remove multi-preconditioning #65

Closed ClashLuke closed 2 years ago

ClashLuke commented 2 years ago

Half the speed but only 0.5% lower loss. Not worth it.

ClashLuke commented 2 years ago

It's 3x as fast without, uses less memory, and converges better. Removing it.