HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 5 forks source link

Fix checkpoint #76

Closed ClashLuke closed 1 year ago

ClashLuke commented 1 year ago

Resume works: grafik

Additionally, the data loader is more random now, which improves generalization and reduces the loss spikes when changing datasets: grafik