Helsinki-NLP / OPUS-CAT

OPUS-CAT is a collection of software which make it possible to OPUS-MT neural machine translation models in professional translation. OPUS-CAT includes a local offline MT engine and a collection of CAT tool plugins.
MIT License
71 stars 11 forks source link

NMT - English - Arabic Model #9

Closed drkhateeb closed 7 months ago

drkhateeb commented 3 years ago

Hello Developers thanks for these engines, but I did not find Eng-Arabic Model how to get it? Regards

Parallel text http://opus.nlpl.eu/MultiUN.php

English-Arabic Moses format http://opus.nlpl.eu/download.php?f=MultiUN/v1/moses/ar-en.txt.zip

Arabic -English TMX http://opus.nlpl.eu/download.php?f=MultiUN/v1/tmx/ar-en.tmx.gz

TommiNieminen commented 3 years ago

Hi,

There is an English to Arabic MT model available through the Tatoeba-Challenge repository, but unfortunately it is not usable with OPUS-CAT yet. We are working on integrating Tatoeba-Challenge models to OPUS-CAT, so the model will eventually be available (could be fairly soon even).

In principle you can already use Tatoeba-Challenge models by installing them manually (see here, but this particular Arabic model has multiple target variants (acm afb apc apc_Latn ara ara_Latn arq arq_Latn ary arz), and multiple target languages in a single model are not supported yet in OPUS-CAT.

drkhateeb commented 3 years ago

Thank you for your reply

I have some questions: 1- the (&apos) appears in translated text (Arabic to English) - Trados 2021 how to fix it? - see the screenshot ags

2- How can I train the model with my own translation memory and my own terms?

3- how to create a new customized model (eng-ar) and (Ar-en)

4- how to donate you to this great job!

contact: aazzoma@gmail.com

TommiNieminen commented 3 years ago

Thanks for the kind words, let me see if I can answer some of your questions here:

  1. It seems to me that the bilingual corpora that have been used to train the MT model contains the ' XML entity. This is strange, since the corpora are cleaned before training, but sometimes the cleaning fails. I tested the Arabic to English model a bit, and while I also managed to produce the ' in some translations ("Women ' s and girls'"), a normal apostrophe also seems to occur ("From the girls' to the girls'."). So I think the &apos might occur only with this specific phrase, "Women's and girls'". If it does occur so often that becomes a problem, the model might have to be retrained.

2 and 3. It's not possible to train models from scratch (since that would require too much computing power), but the OPUS-MT base models can be fine-tuned (customized). The documentation is a work in progress, but here are instructions for fine-tuning a model with an tmx file: https://helsinki-nlp.github.io/OPUS-CAT/enginefinetune. Another possibility is to use the Fine-tune batch task in Trados (the documentation for that should come next week).

  1. I don't think we can't accept donations due to the legal and bureaucratic consequences, but at this early stage it's very useful to receive feedback, so thanks for that.
TommiNieminen commented 3 years ago

There is a new release of OPUS-CAT available with English to Arabic models, you can download it from here.

When installing an online model, it takes some time for the model list to download. You will see the English to Arabic models once the text Fetching list of online models, please wait... changes into Downloadable online models.

kuva

The Arabic models are multilingual models (the different varieties of Arabic are treated as different languages), so you have to check the Multilingual models checkbox to see the models.

kuva

-Tommi