DeepLcom / deepl-dotnet

Official .NET library for the DeepL language translation API.
MIT License
169 stars 23 forks source link

Inconsistent translation result with glossaries #16

Open boeryepes opened 2 years ago

boeryepes commented 2 years ago

Hi, first of all I'm really happy with the DeepL translation and the API. Recently I've started using the glossary function and the results are inconsistent. As I am using your API (v1.2) I assume the issue is not on my side.

I've attached a screen shot of the glossary (EN-DE) and the text source text in ENGLISH and the translation to GERMAN with and without use of glossary (with glossary using a PRO account, without glossary using a FREE account). The ID of the glossary is a0f05fa1-c5b5-45b7-824c-222a540168c1.

In yellow I have highlighted where the glossary has not been applied and in red the weird behavior where the changing the first character of the ENGLISH source to a capital affects the translation to GERMAN but at least the translation is more similar to the glossary (though not fully).

Inconsistent application of glossary

Any clue?

regards, Klaas

daniel-jones-deepl commented 2 years ago

Hi Klaas, good to hear :) Thanks for creating this issue, and thanks for all the details.

I reproduced some of the cases you reported, so I can confirm your usage is correct; I reported these cases to the responsible team at DeepL. Unfortunately I don’t have any fixes at the moment.

There are a couple of tips that might help with glossaries in general though: firstly our glossaries (and translations in general) tend to perform better with larger sentences, due to the added context. Though usually if the translation input matches a glossary source term we'd expect the translation output to match the glossary target term.

Secondly, you might get better results by using natural capitalisation in glossary terms i.e. use the capitalisation that is most natural for the language, e.g. “to read Shakespeare” -> “Shakespeare lesen”, “red phone” -> “rotes Telefon”. For your glossary that means some English terms should be lowercased: “vehicle block”, “vehicle rotation”, “vehicle schedules”, “shift”, etc. I tested these changes in your case as well, unfortunately it doesn't fix the issue; we're looking into it.

boeryepes commented 2 years ago

Thanks Daniel, happy to contribute. With regards to natural capitalization you are opening a can of worms as the Americans would say ... capitalization can be intentional by the author and influenced by the type of document. For instance if the author is a client whom we cannot influence (e.g. has issues a request for proposal) and/or the document is a legal document then capitals can have a meaning. In general the translation should not depend on additional interpretations to the target language unless it is 'natural' in the language to depend on capitalization. For instance in German nouns are capitalized.

And indeed, the glossary should be taken literally but even there DeepL has an issue. The Glossary target term is 'Umlaufplan' but DeepL translates to 'umlaufpläne' which in German should always be with capital 'U'.