facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.43k stars 6.4k forks source link

XLM-RoBERTa example for XNLI #1367

Closed ericwtlin closed 4 years ago

ericwtlin commented 4 years ago

Could you please provide an example of XNLI tasks for XLM-RoBERTa? Current example (https://github.com/pytorch/fairseq/tree/master/examples/xlmr) is quite simple and it is for single sentence. Thanks a lot!

ngoyal2707 commented 4 years ago

Hey, we will release XNLI fine-tuning instructions soon.

ericwtlin commented 4 years ago

thanks! Looking forward to it.

tomking1988 commented 4 years ago

Hey, we will release XNLI fine-tuning instructions soon.

I am using the same format as BERT. My result is 0.828 for En, 0.732 for Zh. Using XLMR-base, 4 epoch, learning rate 2e-5, batch size 16. Could you please give the hyperparms for reproduce the results published in the paper?

ngoyal2707 commented 4 years ago

@kartikayk Can you please share above details?

kartikayk commented 4 years ago

@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:

huihuifan commented 4 years ago

closing after @kartikayk 's answer

yuchenlin commented 3 years ago

@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:

  • Batch Size / GPU = 16 on 8 GPUs (Effective BS = 128)
  • Adam with a LR of 0.000005
  • We run validation after each epoch - where the epoch consists of 5K batches with data randomly sampled from the training set - and select the checkpoint with the best validation set result. This is quite important.
  • We run training for 30 epochs with early stopping (stop if the validation accuracy has not improved for 5 epochs) where epoch is defined as above.

Hi @kartikayk would you please let us know the lr for fine-tuning other languages? I found that it is very sensitive and if the lr is not correct and XLM-R (large) learns nothing. Thanks!