Closed wasiahmad closed 4 years ago
Hi Wasi, We train using Squad v1.1 training set. We use MLQA-en as the validation dataset for early stopping. We report results on the MLQA Test sets
Thanks for the confirmation. Can you confirm if the following settings are used for M-BERT?
learning rate = 5e-5 warmup_steps = 0 epochs = 3 gradient_accumulation_steps = 1 grad_clipping = 1.0
Especially the epochs because you said, you have performed early stopping.
As the documentation says, "The MLQA paper presents several baselines for zero-shot experiments on MLQA, with training QA data taken from SQuAD V1.1, and using the MLQA English development set for early stopping."
Did you use both train-dev split from SQuAD V1.1 as the training data and use MLQA-en as the validation dataset?