Pretrained Embeddings, Stop at EOS, Untied Embeddings

HomebrewNLP / Olmax

HomebrewNLP in JAX flavour for maintable TPU-Training

BSD 2-Clause "Simplified" License

45 stars 6 forks source link

Pretrained Embeddings, Stop at EOS, Untied Embeddings #28

Closed ClashLuke closed 2 years ago

ClashLuke commented 2 years ago

The performance is a bit better without shared embeddings (~5% lower loss or 15% faster training):

On the other hand, pretrained embeddings don't help convergence after the first few hours if input and output embeddings are shared: grafik

ClashLuke commented 2 years ago

The model doesn't break, but I can't add the new model on the eval server as it's using a model with shared IO. Except for the model, everything is synchronised to the demo page.\ The model itself works fine as well. This is the most recent run using the model proposed here.\ I stopped the previous sweep and will launch a new sweep in a few minutes.