[Experiment settings]: the total training steps and batch size

I have searched both the paper and the code but found nowhere illustrating the total training steps. And also I find that batch size is 64 in the paper. But the shell you gave us in https://github.com/facebookresearch/XLM#pretrain-a-language-model-with-mlm-and-tlm says it's 32 (don't knowing num of GPUs as well).

I would really appreciate it if you can provide us with more specific details about the training commands.

facebookresearch / XLM