Open PeterNerlich opened 2 weeks ago
I think this is a very interesting topic with the potential to blow up a bit. As you write, LLM's are not that deterministic with their results/predictions as might be practical for us to test their consistency regarding our use cases. I'd love to discuss this in more depth.
This should not be implemented as part of
integreat_cms
, but rather in our server infrastructure. However, since to develop the proposed tests we need the insight into typical translation content, I'll be borrowing this issue tracker.Motivation
We currently experience issues with both DeepL and Google serving corrupted translations (such as putting phrases marked with
translate="no"
at the start of the sentence rather than where they make sense) for various source/target language pairs, which is not something we can influence.Proposed Solution
Add a daily cron job to
In order to accomplish that, these known examples have to be defined. They should include all important features that we expect in content translations, in enough variations and redundancy that we can be reasonably confident in the result of the reports. I imagine something like:
class="notranslate"
,translate="no"
,notranslate
) surrounding a word or phraseThis assumes that we can decide reasonably well whether deviations occur or not. We likely will need some sort of fuzzy matching, as we might not be able to capture all possible different strings the API might return for any of the known examples that we would regard as valid. If such an automated decision algorithm cannot be found, it might be good enough to post the whole example with the translated version in Mattermost, maybe along with the translation put through the system again and translated back to the source language, and have a human check for deviations every day. This can be sped up by saving results that have been reviewed in the past and not mentioning those previously marked to be a good translation, whenever they get produced by the API again.
Alternatives
User Story
As a service provider I want to know about quirks and problems of my upstream translation services rather quickly so that I can give suggestions to my clients, or at least not have to find out about quirks at a press conference.
Additional Context
https://github.com/digitalfabrik/integreat-cms/pull/3135#issuecomment-2424968026
3157