facebookresearch / XLM

PyTorch original implementation of Cross-lingual Language Model Pretraining.
Other
2.87k stars 495 forks source link

Clarification regarding emb_dim parameter value used in the paper #328

Open asolano opened 3 years ago

asolano commented 3 years ago

Greetings,

Would it be possible to get a confirmation on the emb_dim parameter value used for training the BERT model on the original XLM paper? I am trying to measure its effect on accuracy, GPU memory and training time, but the 2048 value suggested on the README always fails to improve after a few epochs (512 and 1024 have no issue increasing).

For reference, in the paper, section 5.1 Training details, it says "we use a Transformer architecture with 1024 hidden units", but both the README and issue #112 suggest using 2048.

Thanks,

Alfredo

snowood1 commented 3 years ago

Same question here. Confused.