Closed ericwtlin closed 4 years ago
Hey, we will release XNLI fine-tuning instructions soon.
thanks! Looking forward to it.
Hey, we will release XNLI fine-tuning instructions soon.
I am using the same format as BERT. My result is 0.828 for En, 0.732 for Zh. Using XLMR-base, 4 epoch, learning rate 2e-5, batch size 16. Could you please give the hyperparms for reproduce the results published in the paper?
@kartikayk Can you please share above details?
@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:
closing after @kartikayk 's answer
@tomking1988 I'm guessing you're talking about the zero-shot setting here. Following is the set up we used for the numbers published in the paper:
- Batch Size / GPU = 16 on 8 GPUs (Effective BS = 128)
- Adam with a LR of 0.000005
- We run validation after each epoch - where the epoch consists of 5K batches with data randomly sampled from the training set - and select the checkpoint with the best validation set result. This is quite important.
- We run training for 30 epochs with early stopping (stop if the validation accuracy has not improved for 5 epochs) where epoch is defined as above.
Hi @kartikayk would you please let us know the lr for fine-tuning other languages? I found that it is very sensitive and if the lr is not correct and XLM-R (large) learns nothing. Thanks!
Could you please provide an example of XNLI tasks for XLM-RoBERTa? Current example (https://github.com/pytorch/fairseq/tree/master/examples/xlmr) is quite simple and it is for single sentence. Thanks a lot!