DeepLcom / deepl-python

Official Python library for the DeepL language translation API.
https://www.deepl.com
MIT License
1.14k stars 80 forks source link

Translation error with markup text #68

Open yichidev opened 1 year ago

yichidev commented 1 year ago

When translating the markup text via DeepL API, the newlines, \\n, are squeezed into the wrong tag (i.e.,</c>). Please see the following example, where DE is the source and RU is the target :)

DE: Liste der Eigenschaften:\\n* <c id=\"f48c4591-9f64-4ac8-af6a-72228cd50793\">Kamera</c>\\n* <c id=\"882df0f7-39a5-4f99-ae8a-e23a485419ee\">Akku</c>\\n
RU: Список свойств:\\n* <c id=\"f48c4591-9f64-4ac8-af6a-72228cd50793\">Камера\\n*</c> <c id=\"882df0f7-39a5-4f99-ae8a-e23a485419ee\">Аккумулятор\\n</c>

The expectation would be that the 2nd and 3rd newlines still stay outside of the </c> tag, after translating.

seekuehe commented 1 year ago

We have raised your issue with the relevant team. Thank you for reporting! We'll keep you posted if we find anything.

DeeJayTC commented 1 year ago

hey @ChingYi-AX could you give us the exact request you've sent including the parameters etc?

yichidev commented 1 year ago

@seekuehe @DeeJayTC Thank you very much for your help and sorry for the late reply! Here is the exact request with the parameters:

_BASE_PARAMS = {
    "split_sentences": "nonewlines",
    "tag_handling": "xml",
    "non_splitting_tags": "b,c",
}
auth_header = {"Authorization": OUR_DEEPL_API_KEY}
texts = ["Liste der Eigenschaften:\\n* <c id=\"f48c4591-9f64-4ac8-af6a-72228cd50793\">Kamera</c>\\n* <c id=\"882df0f7-39a5-4f99-ae8a-e23a485419ee\">Akku</c>\\n"]

data = {"source_lang": "DE", "target_lang": "RU", **_BASE_PARAMS, "text": texts}

response = requests.post(
    "https://api.deepl.com/v2/translate",
    timeout=20,
    headers=auth_header,
    data=data,
)