Open wasiahmad opened 4 years ago
I am trying to reproduce the results presented in table 6 of the paper for generalized XLT using M-BERT.
I have done the following.
learning rate = 5e-5 warmup_steps = 0 epochs = 3 gradient_accumulation_steps = 1 grad_clipping = 1.0
I got the following result. As you can see the performance is very poor particularly for Hindi and Vietnamese language. I think a different inference algorithm is used in your work. Is it possible to briefly explain what you did during inference?
I am trying to reproduce the results presented in table 6 of the paper for generalized XLT using M-BERT.
I have done the following.
I got the following result. As you can see the performance is very poor particularly for Hindi and Vietnamese language. I think a different inference algorithm is used in your work. Is it possible to briefly explain what you did during inference?