HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
46 stars 6 forks source link

Remove dead code #103

Closed ClashLuke closed 1 year ago

ClashLuke commented 1 year ago

stable gain after 8d30b08. running a test, then merging

ClashLuke commented 1 year ago

baseline grafik

new grafik

same speed, but potentially better convergence due to higher precision. (softmax loss now in fp64 rather than fp32)

ClashLuke commented 1 year ago

better than baseline on main and better than hand-tuned shampoo grafik