DeepLcom / deepl-php

Official PHP library for the DeepL language translation API.
MIT License
202 stars 23 forks source link

Incorrect translation with xml tags #49

Closed twarkie closed 2 months ago

twarkie commented 2 months ago

Hello! I'm seeing some strange translations when using the DeepL API with xml tags. The tags are just used to map the result back to the original text.

Here's a simple Swedish string that can be used to reproduce the problem: <x key="0">Köpesumma</x>. It should translate to Purchase amount.

$this->client = new \DeepL\Translator(config('services.deepl.api.key'));

$translated = collect(
    $this->client->translateText(
        ['<x key="0">Köpesumma</x>'],
        'sv',
        'en-US',
        [
            TranslateTextOptions::TAG_HANDLING => 'xml',
            TranslateTextOptions::PRESERVE_FORMATTING => false,
            TranslateTextOptions::FORMALITY => 'prefer_less',
        ]
    )
)

After looking through the code I found that the following form data was generated for cURL: target_lang=en-US&source_lang=sv&formality=prefer_less&text=%3Cx+key%3D%220%22%3EKopesumma%3C%2Fx%3E&preserve_formatting=0&tag_handling=xml

And the following response was returned via the library:

{
    "translations": [
        {
            "detected_source_language": "SV",
            "text": "<x+key=\"0\">Copayment</x>"
        }
    ]
}

...which is incorrect! If I manually call the API and use JSON to post the data, it works as expected:

Input:

{
  "text": [
    "<x key=\"0\">Köpesumma</x>"
  ],
  "target_lang": "en-US",
    "source_lang": "sv",
    "formality": "prefer_less",
    "preserve_formatting": false,
    "tag_handling": "xml"
}

Response:

{
    "translations": [
        {
            "detected_source_language": "SV",
            "text": "<x key=\"0\">Purchase price</x>"
        }
    ]
}

...which is correct! I'm not sure if there is a problem with the library or if I'm using it incorrectly?

JanEbbing commented 2 months ago

Hm, if I var_dump the output of your code I get the correct result:

array(1) {
  [0]=>
  object(DeepL\TextResult)#19 (2) {
    ["text"]=>
    string(29) "<x key="0">Purchase price</x>"
    ["detectedSourceLang"]=>
    string(2) "sv"
  }
}

Looking at what you posted, the issue seems to be in your curl client: target_lang=en-US&source_lang=sv&formality=prefer_less&text=%3Cx+key%3D%220%22%3EKopesumma%3C%2Fx%3E&preserve_formatting=0&tag_handling=xml is lacking the ö in Köpesumma, and instead has an o. This should be correctly URL-encoded in HttpClientWrapper::urlEncodeWithRepeatedParams, but I'm not sure what would cause this to differ on your machine (Translating Kopesumma gives me Copayment as well). Can you install a different HTTP Client (eg Guzzle) and configure the translator to use that instead and try with that? It is described in the README.

twarkie commented 2 months ago

Very nice catch @JanEbbing! I will take a look to see why that happens. I do get different results on different machines but now I know where to look. Thanks.