Closed yazdanbakhsh closed 3 years ago
It seems to me that both monolingual and monolingual_functions add a suffix to the training files that are not appropriately set in other parts of training pipeline or maybe I am missing to set some flags or something.
Thanks
Hi.
We tried to make the data preprocessing and the training pipelines independent so we don't set flags outside of the parameters for training.
If you want to train your MLM on the monolingual dataset (what we did for TransCoder), you either to rename/create symlinks to have train.[cpp | java | python].[0..NGPU].pth
files with the content of the monolingual .pth files or to set:
--lgs 'java_monolingual-python_monolingual' \
--mlm_steps 'java_monolingual,python_monolingual' \
Thanks Baptiste.
I have followed the preprocessing step for both
monolingual
andmonolingual_functions
. The generated files inXLM-syml
are as follows:However, whenever I start the training using the script in the README (copied below), I get the following the
file not found
error. It seems to me that the script is looking for different files.Error
Training Scripts