DeepLcom / deepl-python

Official Python library for the DeepL language translation API.
MIT License
1.06k stars 75 forks source link

translator.create_glossary() forces to remove regional variant #109

Open CJRzzZ opened 2 weeks ago

CJRzzZ commented 2 weeks ago

I've encountered a problem with the translator.create_glossary() function, where it sets the source language of a glossary object to "EN" despite the argument specifying "EN-US". This behavior seems to stem from the code in "translator.py" at line 302, which attempts to strip regional variants and retain only the base language code.

This leads to an issue because "EN" is deprecated in the DeepL API, which then throws a deepl.exceptions.DeepLException stating "target_lang="EN" is deprecated, please use "EN-GB" or "EN-US" instead." Furthermore, if the glossary is set with "EN" and translator.translate_text() is called with "EN-US" as the source language, a ValueError is raised, stating "source_lang and target_lang must match glossary". This inconsistency makes it impossible to use a matching value for the source language.

Could you please look into this? Thank you for your attention to this matter.

JanEbbing commented 2 weeks ago

Sorry, can you clearly describe (maybe with sample code) what you are doing and what error you get?

  1. Glossaries don't have a regional variant attached to them, so "EN" is correct as the source or target language of a glossary.
  2. It should then be possible to use glossaries for all variants of their associated language.
  3. "EN-US" as the source language This sounds like the issue - the source language would have to be "EN". Regional variants are only supported for target languages. The error you get seems to be wrong though, I can follow up on this.

You can read more on this differentiation in the documentation here

CJRzzZ commented 2 weeks ago

Sure, here is the sample code, g = translator.create_glossary("GITCG_en_to_jp", 'EN-US', 'JA', dict_en_to_jp ) result = translator.translate_text(clean_text, source_lang=source_lang, target_lang=target_lang, glossary=g, ).text In the first line, I tried to store the glossary with "EN-US" as the source language. The function "create_glossary" will automatically convert the source language into "EN". But it brings problem in the second line, when I tried to use "EN-US" as the source_lang, it returned "source_lang and target_lang must match glossary" error; when I tried to use "EN" as the source_lang, it returned "target_lang="EN" is deprecated, please use "EN-GB" or "EN-US" instead" error. So this is the error I have met and I hope I made it clear to you.

JanEbbing commented 2 weeks ago

Yes, like I said - we differentiate between source and target languages

  1. "EN" is a valid source language
  2. "EN-US" is an invalid source language
  3. "EN" is an invalid target language
  4. "EN-US" is a valid target language

So in your code, the following should work:

source_lang = "EN"
target_lang = "JA"
g = translator.create_glossary("GITCG_en_to_jp", source_lang, target_lang, dict_en_to_jp )
result = translator.translate_text(clean_text, source_lang=source_lang, target_lang=target_lang, glossary=g, ).text