Helsinki-NLP / OPUS-MT-train

Training open neural machine translation models
MIT License
318 stars 40 forks source link

Handle sublanguages with OPUS-CAT MT Engine memoQ Integration #50

Open alfx3 opened 3 years ago

alfx3 commented 3 years ago

Hello, I have downloaded the en-it model, but I noticed that OPUS-CAT MT Engine doesn't work with sublanguages via the memoQ plugin. For example, if I translate EN>IT-ITA I'd like the MT to return results as EN>IT. Same with EN-US>IT, EN-UK>IT-ITA, etc. In other words, I want any sublanguage to use the main language. How can I achieve this?

jorgtied commented 3 years ago

If this is about the source language then you should not worry about it at all. The encoder should be able to recognize those differences. If the variation is in the target language then this becomes more tricky as we need to add target language labels. We are working on an integration of multilingual models in the CAT tools but this is not ready yet. However, we were not thinking of regional language variant support so far.