facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.49k stars 6.41k forks source link

Base-size pre-trained models #1651

Closed XinnuoXu closed 2 years ago

XinnuoXu commented 4 years ago

❓ Questions and Help

What is your question?

1) Does Bart offer base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models? Since in the summarization task, the baseline BERTSUMABS is trained on bert-base(12-layer encoder, 6-layer decoder, both hidden size 768), have you ever compared base-size Bart with it?

2) Could you please offer a README file for XSum (similar with the CNN one)?

3) How much time does the XSum fine-tuning take with smaller GPUs (like 4 11GB GPUs)?

@myleott @yinhanliu @ngoyal2707

yinhanliu commented 4 years ago
  1. our base model is trained on wiki-bookcorpus only.
  2. will do
  3. we use 16 32gpus for 1 hour (30K steps). so in your case it is 8 hours.
YizhuLiu commented 4 years ago

@XinnuoXu Hi, Have you evaluated the bart.large.cnn model? Did you get the same R-2 score on CNN/DM datase as published? I used pre-trained model to fine-tune CNN/DM training. But the ROUGE-2 is 19.19 (R-2 in published paper is 21.28). Thank you very much!

yinhanliu commented 4 years ago

@YizhuLiu you need to use the right max-len, min-len, Len-penalty and beam size values.

YizhuLiu commented 4 years ago

@yinhanliu Thank you for your reply. We set these values as shown in "Evaluating the bart.large.cnn model": beam=4, lenpen=2.0, max_len_b=140, min_len=55. With this setting, the R-2 score is 20.03. Are they right? If not, how can I get the same R-2 score on CNN/DM as published?

ricardorei commented 4 years ago

Will the Bart base-size(6-layer encoder, 6-layer decoder, hidden size 768) pre-trained models be released? I would like to play with them and it is hard for me to fine-tune the large model.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale[bot] commented 2 years ago

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!