digitalfabrik / integreat-cms

Simplified content management back end for the Integreat App - a multilingual information platform for newcomers
https://digitalfabrik.github.io/integreat-cms/
Apache License 2.0
55 stars 33 forks source link

Add glossary to DeepL API #1346

Open ulliholtgrave opened 2 years ago

ulliholtgrave commented 2 years ago

Motivation

We want to add and use a glossary for certain words. This should be maintained via some private area/file.

Additional Context

API docs: https://www.deepl.com/de/docs-api/glossaries/

Python client docs: https://github.com/DeepLcom/deepl-python#glossaries

dkehne commented 2 years ago

We have decided that there is only one Integreat-wide glossary

charludo commented 1 year ago

Is the goal that municipalities can add/edit glossary entries on their own?

Or should only staff roles be able to do that?

ulliholtgrave commented 1 year ago

This should only be done by the staff roles. However, I am not really sure about the way we want to provide it to them.

@osmers Can you bring this up in your team call and come up with an idea about how you want to edit this files?

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

timobrembeck commented 1 year ago

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

I would definitely prefer the database! :sweat_smile:

ulliholtgrave commented 1 year ago

My initial idea would either be some text file for each language (e.g. like a config file) or we can provide some UI and save the glossary entries in our database.

I would definitely prefer the database! 😅

I definitely agree that the database would be the more ideal solution, but I am a little bit afraid of the amount of phrases we are supposed to show in the UI. If we end up with >200 entries with input, we really need some decent UI to manage this and a JSON in a TXT file might be the easier to implement and with "Strg+F" the directer way 🤷‍♂️

timobrembeck commented 1 year ago

We could implement a csv or json import/export? And we could intercept Ctrl + F to directly jump into our own search input field? :smile:

timobrembeck commented 1 year ago

I mean, in theory we can also outsource the storing of information to DeepL itself. And only query the stored entries from time to time and store them in our cache (if even necessary)...

timobrembeck commented 1 year ago

Ah ok, just read the API docs, and apparently glossaries can only be created and deleted, not modified. So modification only works via retrieving, then modifying locally, then deleting the old entry and uploading the new entry. And the API accepts entries via csv. So the most simple solution would probably be the following:

What do you think?

ulliholtgrave commented 1 year ago

Yes, I agree. That sounds good 👍

osmers commented 1 year ago

I think we can currently download the glossary as an excel file - so we could provide that? Since we don't need to change it constantly an option to download, ammend and then upload again would be sufficient. Not sure if that is what Timo was refering to...

timobrembeck commented 1 year ago

Wait, we already have a DeepL Pro Advanced account? Then the basic functionality we're talking about here should already be offered by the DeepL web UI (see here)? I think there is no need to implement this in the CMS if we're just implementing exactly the same functionality as DeepL itself... :thinking:

osmers commented 1 year ago

Not sure if we do - I assume so, yes, because otherwise we would not have enough translation budget. The glossary right now is implemented in MemoQ I will check DeepL for you - one sec.

timobrembeck commented 1 year ago

Indeed: Screenshot 2022-11-19 at 19-09-09 DeepL Translate – Der präziseste Übersetzer der Welt

osmers commented 1 year ago

So you already checked our account? Then this should be easy enough, right?

osmers commented 1 year ago

But it seems that glossaries don't work for most of our language pairs...

osmers commented 1 year ago

image https://support.deepl.com/hc/en-us/articles/360021634540-About-the-glossary-feature

osmers commented 1 year ago

Just English and French are possible...

timobrembeck commented 1 year ago

Oh, and I noticed another problem: we use two differrent accounts: the glossary can only be uploaded via the UI for the "DeepL Pro" account and we perform our automated translations with the "DeepL API Free" account. Probably, there is no complete trivial way of copying the glossaries... We could however ask the DeepL support whether it's possible to transfer glossaries between accounts, but probably they will refuse to do so.

So back to the drawing board, we probably need to copy the basic upload in our CMS to be able to pass the glossaries to the API account. But yes, let's talk about whether the effort is justified when only two languages are supported with German as source language...

osmers commented 1 year ago

So if they don't support glossaries for more languages, it does not matter what we build into our system? Couldn't we still use it and somehow enforce certain translations? I don't know, maybe putting in alternatives for the word that DeepL provides if you translate just the word and tell the system or whatever that if it finds of one those, to replace it with ours from the glossary?

osmers commented 1 year ago

Do you need any more input from our side on this?

timobrembeck commented 1 year ago

Do you need any more input from our side on this?

Probably yes: So as far as I understood it, we sadly cannot use the DeepL Pro account glossary for our DeepL API account requests. So this would be a bit of work to do, not sure if worth the effort if it can only be used for two languages.

So if they don't support glossaries for more languages, it does not matter what we build into our system? Couldn't we still use it and somehow enforce certain translations? I don't know, maybe putting in alternatives for the word that DeepL provides if you translate just the word and tell the system or whatever that if it finds of one those, to replace it with ours from the glossary?

You mean like completely implement our own glossary? This would definitely be a lot of effort. Maybe even more effort than having to manually fix machine translations in case potential glossary have been translated incorrectly. But yes, in theory it's doable.

osmers commented 1 year ago

Probably yes: So as far as I understood it, we sadly cannot use the DeepL Pro account glossary for our DeepL API account requests. So this would be a bit of work to do, not sure if worth the effort if it can only be used for two languages

Dito - just for two languages it does not make sense - we would need to check the terms for all the languages we have a glossary for.

You mean like completely implement our own glossary? This would definitely be a lot of effort. Maybe even more effort than having to manually fix machine translations in case potential glossary have been translated incorrectly.

Not sure how feasible and realistic manual fixing is - but yes, that is essentially what I meant. But I can see how it is very difficult. Another idea I had was that we compile an alternative list of words, like you always have in dictionary suggestions (e.g. Straße can be road and street in English). So if we have this list, we can at least tell the system that if it finds and of those words, to replace it with the correct one from our glossary?

I am not sure if this is feasible though due to case declination of words (Dativ, Genitiv, etc Anpassung...)

timobrembeck commented 1 year ago

I am not sure if this is feasible though due to case declination of words (Dativ, Genitiv, etc Anpassung...)

Hmm, in my opinion we're opening Pandora's Box here :sweat_smile: I guess that's just one of the limitations of machine translations - there is always some margin of error which can either be accepted or fixed by humans. I doubt that any manual string replacement on our side is good enough to fix more problems than it causes. So at the moment, I'd suggest to put this on hold until DeepL supports more languages for the glossary - and as soon as this is the case, I think the effort for implementing support for DeepL's glossary mechanism is justified.

osmers commented 1 year ago

Agreed!

osmers commented 8 months ago

Just saw that DeepL now supports more languages - image

Question remains whether we can use it - would it help if we switched to the DeepL Pro Account to use the API and Glossary? Or are we by now using Pro anyways?

Edit: For DeepL API Free and DeepL API Pro subscribers

You can create glossaries with your DeepL API (Free and Pro) subscription. Please consult this article and our API documentation to learn how you can manage glossaries with the DeepL API.

If you use the DeepL API (Free and Pro) in third-party software, please note that plug-ins are not developed by DeepL SE. DeepL supports glossary functionality via the API, but your plug-in provider might require some time to implement this functionality in their plug-in. For more information, please contact the provider of your plug-in.

timobrembeck commented 8 months ago

@osmers good catch!

Question remains whether we can use it - would it help if we switched to the DeepL Pro Account to use the API and Glossary? Or are we by now using Pro anyways?

We only can use the API account for the CMS because it would be way to complicated to program any kind of interaction between the CMS and the DeepL UI – it only makes sense to interact via the API, which is only possible with an API account. Fortunately, this feature was enabled for the API as well, also with more language :tada: So I think this issue is no longer blocked and can be prioritized (although keep in mind that I estimate the effort to be high despite the new feature).

Supported Languages ``` In [1]: import deepl In [2]: from django.conf import settings In [3]: glossary_languages = deepl.Translator(settings.DEEPL_AUTH_KEY).get_glossary_languages() Oct 31 17:11:34 INFO deepl - Request to DeepL API method=GET url=https://api-free.deepl.com/v2/glossary-language-pairs Oct 31 17:11:34 INFO deepl - DeepL API response status_code=200 url=https://api-free.deepl.com/v2/glossary-language-pairs In [4]: for language_pair in glossary_languages: ...: print(f"{language_pair.source_lang} to {language_pair.target_lang}") ...: de to en de to es de to fr de to ja de to it de to pl de to nl de to zh de to ru de to pt en to de en to es en to fr en to ja en to it en to pl en to nl en to zh en to ru en to pt es to de es to en es to fr es to ja es to it es to pl es to nl es to zh es to ru es to pt fr to de fr to en fr to es fr to ja fr to it fr to pl fr to nl fr to zh fr to ru fr to pt ja to de ja to en ja to es ja to fr ja to it ja to pl ja to nl ja to zh ja to ru ja to pt it to de it to en it to es it to fr it to ja it to pl it to nl it to zh it to ru it to pt pl to de pl to en pl to es pl to fr pl to ja pl to it pl to nl pl to zh pl to ru pl to pt nl to de nl to en nl to es nl to fr nl to ja nl to it nl to pl nl to zh nl to ru nl to pt zh to de zh to en zh to es zh to fr zh to ja zh to it zh to pl zh to nl zh to ru zh to pt ru to de ru to en ru to es ru to fr ru to ja ru to it ru to pl ru to nl ru to zh ru to pt pt to de pt to en pt to es pt to fr pt to ja pt to it pt to pl pt to nl pt to zh pt to ru ```
osmers commented 8 months ago

@timobrembeck yup, I found the info as well that we can use the API Free Account to implement the glossary :) nice!! It's something we need to do in order to make automatic translations better. So I think even though the effort is high, it is something we should do soonish :) like next quarter

dkehne commented 6 months ago

push to backlog. this is not as urgent as other tickets.