facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.3k stars 6.39k forks source link

use RoBERTa on Summarization/Translation task #1465

Closed AlaFalaki closed 4 years ago

AlaFalaki commented 4 years ago

Hello, I just want to know if it is possible to use the RoBERTa architecture on tasks like abstractive summarization? I couldn't find any clue on the documents and codes.

Thanks in advance.

ngoyal2707 commented 4 years ago

RoBERTa is trained on Masked Language Modeling (MLM) pretraining objecting. Typically MLM objective, does good on NLU (classification, regression etc) like downstream tasks but not as good on generation tasks (summarization, dialog, translation etc). There are some papers out which try to use it but we haven't tried it ourselves.

Saying that, we have another project called BART, which trains seq2seq model on denoising objective. And it does quite good on summarization. (and is almost on par with RoBERTa on NLU tasks). Here are the instructions to use it.