LiyuanLucasLiu / Transformer-Clinic

Understanding the Difficulty of Training Transformers
https://arxiv.org/abs/2004.08249
Apache License 2.0
326 stars 20 forks source link

Admin for 100L-100L model? #24

Closed Vincent131499 closed 2 years ago

Vincent131499 commented 2 years ago

It is mentioned in the article that 8 pieces of A100 are used to train the model. How long has it been trained and how many epochs have been reached? What is the specific performance/bleu of the final model?

LiyuanLucasLiu commented 2 years ago

Thanks for asking : -)

I trained the model for 40 epochs and got a BLEU score of 29.5 (on WMT'14 En-De). I didn't finish the training due to the high cost, so I don't know whether the performance could be better if trained longer (I feel probable not unless you train it for a really really long time).

More details would be released shortly (featuring a new plug-in-and-play Admin implementation), stay tuned!