Clarification regarding emb_dim parameter value used in the paper

Greetings,

Would it be possible to get a confirmation on the emb_dim parameter value used for training the BERT model on the original XLM paper? I am trying to measure its effect on accuracy, GPU memory and training time, but the 2048 value suggested on the README always fails to improve after a few epochs (512 and 1024 have no issue increasing).

For reference, in the paper, section 5.1 Training details, it says "we use a Transformer architecture with 1024 hidden units", but both the README and issue #112 suggest using 2048.

Thanks,

Alfredo

facebookresearch / XLM

Clarification regarding emb_dim parameter value used in the paper #328