use RoBERTa on Summarization/Translation task

RoBERTa is trained on Masked Language Modeling (MLM) pretraining objecting. Typically MLM objective, does good on NLU (classification, regression etc) like downstream tasks but not as good on generation tasks (summarization, dialog, translation etc). There are some papers out which try to use it but we haven't tried it ourselves.

Saying that, we have another project called BART, which trains seq2seq model on denoising objective. And it does quite good on summarization. (and is almost on par with RoBERTa on NLU tasks). Here are the instructions to use it.

facebookresearch / fairseq

use RoBERTa on Summarization/Translation task #1465