This should not be implemented as part of integreat_cms, but rather in our server infrastructure. However, since to develop the proposed tests we need the insight into typical translation content, I'll be borrowing this issue tracker.

Motivation

We currently experience issues with both DeepL and Google serving corrupted translations (such as putting phrases marked with translate="no" at the start of the sentence rather than where they make sense) for various source/target language pairs, which is not something we can influence.

Proposed Solution

Add a daily cron job to

translate known examples into various languages using DeepL and Google
check if the results are conforming to our expectations or have deviations
send a Mattermost notification if the state changes (deviations appeared/disappeared compared to the previous day)

In order to accomplish that, these known examples have to be defined. They should include all important features that we expect in content translations, in enough variations and redundancy that we can be reasonably confident in the result of the reports. I imagine something like:

plain text: short word or phase, as could appear in a page title
html: single paragraph of a word or phrase, as might appear in page content
html: single paragraph of a span with the power set of all possible known values to indicate something should not be translated (class="notranslate", translate="no", notranslate) surrounding a word or phrase
html: single paragraph of some sentence containing a span with the power set of all possible notranslate indicators
…

This assumes that we can decide reasonably well whether deviations occur or not. We likely will need some sort of fuzzy matching, as we might not be able to capture all possible different strings the API might return for any of the known examples that we would regard as valid. If such an automated decision algorithm cannot be found, it might be good enough to post the whole example with the translated version in Mattermost, maybe along with the translation put through the system again and translated back to the source language, and have a human check for deviations every day. This can be sped up by saving results that have been reviewed in the past and not mentioning those previously marked to be a good translation, whenever they get produced by the API again.

Alternatives

Manual spot checking, which would be very labour intensive
No checking at all, only learn about deviations when somebody notices them and reports it to us, which is the current situation

User Story

As a service provider I want to know about quirks and problems of my upstream translation services rather quickly so that I can give suggestions to my clients, or at least not have to find out about quirks at a press conference.

Additional Context

https://github.com/digitalfabrik/integreat-cms/pull/3135#issuecomment-2424968026

digitalfabrik / integreat-cms

Test whether DeepL/Google translate known examples in expected manner daily #3158

Motivation

Proposed Solution

Alternatives

User Story

Additional Context

3157