XLM-RoBERTa example for XNLI

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.43k stars 6.4k forks source link

XLM-RoBERTa example for XNLI #1367

Closed ericwtlin closed 4 years ago

ericwtlin commented 4 years ago

Could you please provide an example of XNLI tasks for XLM-RoBERTa? Current example (https://github.com/pytorch/fairseq/tree/master/examples/xlmr) is quite simple and it is for single sentence. Thanks a lot!

ngoyal2707 commented 4 years ago

Hey, we will release XNLI fine-tuning instructions soon.

ericwtlin commented 4 years ago

thanks! Looking forward to it.

tomking1988 commented 4 years ago

Hey, we will release XNLI fine-tuning instructions soon.

I am using the same format as BERT. My result is 0.828 for En, 0.732 for Zh. Using XLMR-base, 4 epoch, learning rate 2e-5, batch size 16. Could you please give the hyperparms for reproduce the results published in the paper?

ngoyal2707 commented 4 years ago

@kartikayk Can you please share above details?

kartikayk commented 4 years ago

@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:

Batch Size / GPU = 16 on 8 GPUs (Effective BS = 128)
Adam with a LR of 0.000005
We run validation after each epoch - where the epoch consists of 5K batches with data randomly sampled from the training set - and select the checkpoint with the best validation set result. This is quite important.
We run training for 30 epochs with early stopping (stop if the validation accuracy has not improved for 5 epochs) where epoch is defined as above.

huihuifan commented 4 years ago

closing after @kartikayk 's answer

yuchenlin commented 3 years ago

@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:

Batch Size / GPU = 16 on 8 GPUs (Effective BS = 128)

Adam with a LR of 0.000005

We run validation after each epoch - where the epoch consists of 5K batches with data randomly sampled from the training set - and select the checkpoint with the best validation set result. This is quite important.

We run training for 30 epochs with early stopping (stop if the validation accuracy has not improved for 5 epochs) where epoch is defined as above.

Hi @kartikayk would you please let us know the lr for fine-tuning other languages? I found that it is very sensitive and if the lr is not correct and XLM-R (large) learns nothing. Thanks!