Closed luofuli closed 4 years ago
Thanks for asking. You can use a pre-trained model like XLM-R/mBART, but unfortunately it cannot benefit from Admin.
Intuitively, Admin does not change model architecture / introduce additional hyper-parameters. It only changes the random initialization. For pre-trained models, without random initialization, it would be hard to benefit from Admin.
Really great job! Can I use a pre-trained model such as XLM-R or mBART to initialize the encoder-decoder model rather than randomly initialization? If it does, all I need to do is to set the flags
--restore-file pre-trained-model.pt
?