Helsinki-NLP / OPUS-MT-train

Training open neural machine translation models
MIT License
323 stars 40 forks source link

Source.spm & Target.spm Files #72

Closed hdeval1 closed 2 years ago

hdeval1 commented 2 years ago

So I have been able to build my own models, using the tatoeba-prepare and tatoeba-train recipes, and I was able to generate the pytorch.bin using the conversion script. The only hiccup I am running into is where the source.spm & target.spm files are generated, or I guess how do I get a copy of them? Do I have to generate these on my own or is there possibly a recipe for this? I see val/Tatoeba-dev-v2021-08-07.src.spm32k & val/Tatoeba-dev-v2021-08-07.trg.spm32k (that is what I've been using for the time being) but I don't think those are the source.spm & target.spm files I am looking for. Basically, I want to generate my own version of https://huggingface.co/Helsinki-NLP/opus-mt-zh-en/tree/main and have all the files except the source.spm and target.spm.

Thank you!

jorgtied commented 2 years ago

Yes, this is a bit confusing. The spm files are in the work-directory in the train sub-direcitory and they have different names and extensions. It should be something like opus.trg.spm32k-model and opus.src.spm32k-model.

hdeval1 commented 2 years ago

Perfect, that is exactly what I needed. Thank you SO much!