DeepLcom / deepl-python

Official Python library for the DeepL language translation API.
MIT License
1.06k stars 75 forks source link

Disambiguate homonyms in glossaries / provide alternative translations? #97

Open sarkipo opened 4 months ago

sarkipo commented 4 months ago

Hello,

Is there a way to specify different translations for homonymous words in a glossary?

E.g. in a Russian > English translation, there is a verb 'пропасть' to which I would like to assign the translation "disappear". But 'пропасть' can also be a noun meaning "abyss", and I don't want it to be bluntly replaced by "disappear" everywhere.

Is there a way to disambiguate two words like that in the input? If not, it might help to be able to specify an alternative translation ranked lower than the first one, but I'm not sure how exactly that should work.

JanEbbing commented 4 months ago

Hi, thanks for your question! This is a bit tricky.

  1. In principle, your glossary must not have 2 entries with the exact same string in the source language. See the docs for this part

  2. However, our glossary feature is not a simple string search & replace, so in principle if you add a glossary term 'пропасть => disappear', it won't replace all occurences of 'пропасть' with 'disappear'. In the example you give, it unfortunately does not work properly however.

An example where it works (sorry I don't speak Russian) with English to German:

"rain" in english can be the rain (German: Regen) or to rain (German: regnen).

I can add a term 'rain => schütten' (only makes sense for the verb form) to my glossary, and it translates:

"It was raining a lot that day. The rain really did not seem to stop."

into

"Es schüttete an diesem Tag sehr viel. Der Regen schien wirklich nicht aufzuhören."

However, doing the same in your example fails. This will depend on the specific languages and its associated grammar, for example you can add infinitive markers ("to rain") or articles ("the rain") in your glossary definition, if they exist in the associated language.

Glossary term: 'пропасть => to disappear'

"Загляните в пропасть. Пропасть разверзлась подо мной."

"Look into the disappear. The disappearance opened up beneath me."

I flagged this to the team responsible for glossaries and we might fix this in the future.

sarkipo commented 4 months ago

Hi Jan @JanEbbing, thanks a lot for your answer and for signaling the problem. That's all good to know. Adding "to" is quite fine with me, but I just fear there will be cases where this particular trick won't work. Is there a more general way to specify the relevant part of speech, apart from the infinitive particle? (Something like adding a POS tag like "n", "v", "adj" etc.)

Another problem are the homonyms of the same part of speech, e.g. plane "airplane" vs. plane as a term in geometry.

JanEbbing commented 4 months ago

I don't think we would want to add a POS tag as it doesn't fully solve the problem as you remarked (I would even guess that the majority of homonyms are between words of the same POS). I will check with our glossary team if adding support to distinguish between different word meanings in glossaries is on the roadmap, but can't give an estimate.