digitalfabrik / integreat-cms

Simplified content management back end for the Integreat App - a multilingual information platform for newcomers
https://digitalfabrik.github.io/integreat-cms/
Apache License 2.0
56 stars 35 forks source link

errors in DeepL translation #3160

Open osmers opened 2 days ago

osmers commented 2 days ago

Describe the Bug

For the last few weeks, DeepL translations in the system came back with the wrong word order when a "do-not-translate"-tag was involved. This issue only happens when the dnt-tag is used within a continuos text, not when it stands alone (i.e. in the contact details).

As far as we can tell, this happens for all DeepL languages except English and French. I have contacted DeepL and will update this thread with any information they send.

While searching for what the problem might be, I found this foru for WPML: https://wpml.org/forums/topic/word-in-wrong-order-with-automatic-deepl-translations/ Even though this refers to glossaries, the resulting issue is exactly the same - which leads me to believe that it is not something we can change.

Steps to Reproduce

  1. Go to https://admin.integreat-app.de/testumgebung/pages/de/42661/edit/
  2. Check position of dnt-tags
  3. Change to e.g. Romanian
  4. See that the German words (the ones that were marked with dnt) are at the beginning of the sentence.

Expected Behavior

German words are placed at the correct position within the continuos text.

Actual Behavior

German words are placed at the beginning of the sentence.

Additional Information

The following translation was done on Oct. 29th DeepL Fehler DE-RO The following translation was done in April 2024 (https://admin.integreat-app.de/lkkarlsruhe/pages/ro/11489/edit/) DeepL Richtig DE-RO


curl -X POST "https://api.deepl.com/v2/translate" --header "Authorization: DeepL-Auth-Key [yourAuthKey]" --header 'Content-Type: application/json' --data '{"source_lang": "de", "target_lang": "ro", "tag_handling": "html", "text": ["<p>Für alle Kinder in Deutschland besteht <strong>ab der Geburt bis zur Vollendung des 18. Lebensjahres</strong> (in Einzelfällen auch darüber hinaus) Anspruch auf <strong>Kindergeld</strong> (<span class=\"notranslate\" translate=\"no\">Kindergeld</span>). Ihr Kind muss <strong>in Ihrem Haushalt wohnen</strong> und <strong>von ihnen versorgt</strong> werden.</p>"]}'

Expected:

{"translations":[{"detected_source_language":"DE","text":"<p>Toți copiii din Germania au dreptul la <strong>alocații familiale</strong> (<span class=\"notranslate\" translate=\"no\">Kindergeld</span>) <strong>de la naștere până la vârsta de 18 ani</strong> (în unele cazuri chiar și după această <strong>vârstă</strong>). Copilul trebuie să <strong>locuiască în gospodăria dumneavoastră</strong> și să fie <strong>îngrijit de dumneavoastră</strong>.</p>"}]}

Received:

{"translations":[{"detected_source_language":"DE","text":"<p><span class=\"notranslate\" translate=\"no\">Kindergeld</span>Toți copiii din Germania au dreptul la <strong>alocații familiale</strong> ( )<strong>de la naștere până la vârsta de 18 ani</strong> (în unele cazuri chiar și după această <strong>vârstă</strong> ). Copilul trebuie să <strong>locuiască în gospodăria dumneavoastră</strong> și să fie <strong>îngrijit de dumneavoastră</strong>.</p>"}]}
osmers commented 2 days ago

Test-CMS page: https://integreat-test.tuerantuer.org/testumgebung/pages/de/41799/edit/

svenseeberg commented 1 day ago

I found a very simple test case that demonstrates the issue with the curl requests documented in https://developers.deepl.com/docs/api-reference/document

<p>Hallo (<span translate="no">Welt</span>)</p>

results in

<p><span translate="no">Welt</span>Привіт ( )</p>

BUT

<p>Hallo <span translate="no">(Welt)</span></p>

translates (almost) correctly to

<p>Привіт. <span translate="no">(Welt)</span></p>

This clearly demonstrates that this is an issue with the DeepL translation models for some languages, for example Ukrainian.