LiyuanLucasLiu / Transformer-Clinic

Understanding the Difficulty of Training Transformers
https://arxiv.org/abs/2004.08249
Apache License 2.0
326 stars 20 forks source link

Can I use a pre-trained model to initialize the model? #4

Closed luofuli closed 4 years ago

luofuli commented 4 years ago

Really great job! Can I use a pre-trained model such as XLM-R or mBART to initialize the encoder-decoder model rather than randomly initialization? If it does, all I need to do is to set the flags --restore-file pre-trained-model.pt?

LiyuanLucasLiu commented 4 years ago

Thanks for asking. You can use a pre-trained model like XLM-R/mBART, but unfortunately it cannot benefit from Admin.

Intuitively, Admin does not change model architecture / introduce additional hyper-parameters. It only changes the random initialization. For pre-trained models, without random initialization, it would be hard to benefit from Admin.