HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Rmsprop grafting #47

Closed ClashLuke closed 2 years ago

ClashLuke commented 2 years ago

Baseline: grafik SM3#Shampoo: grafik

It seems like the grafted model outperforms the baseline and has a higher range of acceptable hyperparameters. PR can be merged.