Helsinki-NLP / OPUS-MT-train

Training open neural machine translation models
MIT License
318 stars 40 forks source link

Request for EN-PL model #46

Closed djstrong closed 3 years ago

djstrong commented 3 years ago

It would be great to see EN-PL model!

jorgtied commented 3 years ago

It will come soon I hope.

jorgtied commented 3 years ago

Did you notice that we now have https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-pol?

djstrong commented 3 years ago

Oh, thank you!

djstrong commented 3 years ago

I guess it is time consuming to upload all models to huggingface hub, but it would be helpful.

djstrong commented 3 years ago

@jorgtied I am trying to run this model with command: /build/marian-decoder -c eng-pol/opus-2021-02-19/decoder.yml but the translation is much worse than in opus-2021-02-19.test.txt. What am I missing?

jorgtied commented 3 years ago

Did you preprocess the input data with the subword segmentation model? You need to do that before piping it in into the decoder. See the preprocess script that comes with the model release.

djstrong commented 3 years ago

No, I was reading documentation: https://marian-nmt.github.io/docs/#translation and there is no information about segmentation for input.txt. Thank you, after preprocessing it works!

preprocess.sh eng source.spm < test.txt > test.txt.out
Konskow commented 9 months ago

Has anyone had any luck with converting the model to pytorch? I tried:

python src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py  --models eng-pol --save_dir converted

It successfully creates a converted model but unfortunately, it doesn't work. I get empty strings as a result.