Helsinki-NLP / OPUS-MT-train

Training open neural machine translation models
MIT License
323 stars 40 forks source link

Preprocessing Script Question #77

Open hdeval1 opened 2 years ago

hdeval1 commented 2 years ago

I realized the preprocessing scripts in the OPUS-MT-Train library did not match the ones being published in the OPUS models repository. I am thinking the preprocess scripts in the training library (scripts/) are outdated, because when i used those to train my own model, i ran into issues. I updated those to the attached script (one I pulled from a model in the repo) and things went smoothly. I just want to make sure I am correct in replacing it. This is for building a SPM model, so I replaced scripts/preprocess-spm.sh with the attached file. preprocess.sh.txt