Closed ClashLuke closed 2 years ago
The model doesn't break, but I can't add the new model on the eval server as it's using a model with shared IO. Except for the model, everything is synchronised to the demo page.\ The model itself works fine as well. This is the most recent run using the model proposed here.\ I stopped the previous sweep and will launch a new sweep in a few minutes.
The performance is a bit better without shared embeddings (~5% lower loss or 15% faster training):
On the other hand, pretrained embeddings don't help convergence after the first few hours if input and output embeddings are shared: