Closed hdeval1 closed 2 years ago
Yes, this is a bit confusing. The spm files are in the work-directory in the train
sub-direcitory and they have different names and extensions. It should be something like opus.trg.spm32k-model
and opus.src.spm32k-model
.
Perfect, that is exactly what I needed. Thank you SO much!
So I have been able to build my own models, using the tatoeba-prepare and tatoeba-train recipes, and I was able to generate the pytorch.bin using the conversion script. The only hiccup I am running into is where the source.spm & target.spm files are generated, or I guess how do I get a copy of them? Do I have to generate these on my own or is there possibly a recipe for this? I see val/Tatoeba-dev-v2021-08-07.src.spm32k & val/Tatoeba-dev-v2021-08-07.trg.spm32k (that is what I've been using for the time being) but I don't think those are the source.spm & target.spm files I am looking for. Basically, I want to generate my own version of https://huggingface.co/Helsinki-NLP/opus-mt-zh-en/tree/main and have all the files except the source.spm and target.spm.
Thank you!