HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

Pretrained Embeddings, Stop at EOS, Untied Embeddings #28

Closed ClashLuke closed 2 years ago

ClashLuke commented 2 years ago

The performance is a bit better without shared embeddings (~5% lower loss or 15% faster training): grafik

On the other hand, pretrained embeddings don't help convergence after the first few hours if input and output embeddings are shared: grafik

ClashLuke commented 2 years ago

The model doesn't break, but I can't add the new model on the eval server as it's using a model with shared IO. Except for the model, everything is synchronised to the demo page.\ The model itself works fine as well. This is the most recent run using the model proposed here.\ I stopped the previous sweep and will launch a new sweep in a few minutes.