DeepLcom / deepl-dotnet

Official .NET library for the DeepL language translation API.
MIT License
180 stars 25 forks source link

"invalid start of a value" when trying to translate #33

Closed BenceBordas closed 1 year ago

BenceBordas commented 1 year ago

Describe the bug We've are using the package in our WinForm app using Telerik 2023.2.60653, .Net Framework 4.7.2, package latest 1.7.1 I have developed a feature where a user can select which language it wants to translate to, in our case English or Hungarian Lets say we have posts, and each post can have comments, when the user wants to translate, we translate all the comments. When translating to English it goes well, no problem, but when I want to translate to Hungarian (does not matter if the original language is Hungarian or not) I get the error showed below. Whether I translate to English first or second I get the same result I've been trying to reproduce the problem in a console app, but simply unable to do so, even our users have different experience, some reported that they get the error on the first try. We use HTML to display the comments, we use CefSharp to display a chromium browser inside the WinForm app.

Update 1: For me it does not matter if I do translate fast with 1-2 secs between the two requests or wait 1-2 minutes, basically every second request gets this error, whether I start with English or Hungarian. My colleagues experience difference if they wait a certain amount of time or not. I've tried to make demos which I could upload here making your jobs easier, but I can not make it fail... Somehow it fails in our app, but when I make a mini environment which represents our app, with the same code which makes the API call, I cant make it to fail as it does in our app

Expected behavior We get back the translation no problem

Screenshots or sample code

private void button_click(object sender, EventArgs e)
{
        RadButton translateButton = (RadButton)sender;
        if (translateButton == null) return;

        string targetLanguage = translateButton.Name == "translateToHungarianButton" ? LanguageCode.Hungarian : LanguageCode.EnglishAmerican;
        string sourceLanguage = targetLanguage == LanguageCode.Hungarian ? "EN" : "HU";
        Task.Run(async () => await TranslateText(commentList, sourceLanguage ,targetLanguage ));
}
private async Task TranslateText(string[] textsToTranslate, string sourceLanguage, string targetLanguage)
{
        TextResult[] translatedTexts;
        try
        {
                translatedTexts = await translator.TranslateTextAsync(textsToTranslate, sourceLanguage, targetLanguage, new TextTranslateOptions() { TagHandling = "xml", IgnoreTags = { "a" }, PreserveFormatting = true });
        }
        catch (Exception e)
        {
                MessageBox.Show($"Error: {e}");
                throw;
        }
}

If you guys need any more info please let me know! The problem could be solved with an easy fix, it might be my lack of knowledge

JanEbbing commented 1 year ago

Is there a way to get a sample value for commentList that triggers this problem? Otherwise this is hard to debug due to the garbled stacktrace for the async functions (d__4 etc)

BenceBordas commented 1 year ago

Is there a way to get a sample value for commentList that triggers this problem?

Well its nothing special, a sample:

string[] commentList = new string[]
{
"<p>Sziasztok!</p><p><br></p><p>Az FX team hétfőn egész nap csapatépít így legközelebb kedden értek majd el minket.</p><p><br></p><p>Köszi!</p>",
"<p>Sziasztok,</p><p>Elkezdtem régi Magyar pénz gyűjteményemet foltozgatni.(papír) Ha esetleg valakinek van ez-az és esetleg eladná, keressen meg. Köszi :)</p>",
"<p></p><p>Sziasztok,</p><p>Pénteken, azaz holnap délután az IT csapatépítőn lesz, ezért kérünk benneteket, hogy a fontos issuekkal, igényekkel lehetőleg még a ma keressetek meg minket.</p>",
"<br>the Animation Division will be on team building event on this Friday (31.March). Please contact us with any important issues by the end of Thursday.<br>"
};

With the simplest examples it does the same, even if the commentList contains only one element which is : "<p>Hey, this is me!</p>"

I might have left out an important info, gonna update the post (updated sample code too) Hope it helps somehow, if you have any further questions, please let me know!

JanEbbing commented 1 year ago

Hm for me it also works in this toy console app (for both languages, by changing true to false).

using DeepL;

var translator = new Translator("MY_AUTH_KEY");
string[] commentList = new string[]
{
"<p>Sziasztok!</p><p><br></p><p>Az FX team hétfőn egész nap csapatépít így legközelebb kedden értek majd el minket.</p><p><br></p><p>Köszi!</p>",
"<p>Sziasztok,</p><p>Elkezdtem régi Magyar pénz gyűjteményemet foltozgatni.(papír) Ha esetleg valakinek van ez-az és esetleg eladná, keressen meg. Köszi :)</p>",
"<p></p><p>Sziasztok,</p><p>Pénteken, azaz holnap délután az IT csapatépítőn lesz, ezért kérünk benneteket, hogy a fontos issuekkal, igényekkel lehetőleg még a ma keressetek meg minket.</p>",
"<br>the Animation Division will be on team building event on this Friday (31.March). Please contact us with any important issues by the end of Thursday.<br>"
};

string targetLanguage = true ? LanguageCode.Hungarian : LanguageCode.EnglishAmerican;
string sourceLanguage = targetLanguage == LanguageCode.Hungarian ? "EN" : "HU";
var task = Task.Run(async () => await TranslateText(commentList, sourceLanguage ,targetLanguage ));
task.Wait();
async Task TranslateText(string[] textsToTranslate, string sourceLanguage, string targetLanguage)
{
        DeepL.Model.TextResult[] translatedTexts;
        try
        {
                translatedTexts = await translator.TranslateTextAsync(textsToTranslate, sourceLanguage, targetLanguage, new TextTranslateOptions() { TagHandling = "xml", IgnoreTags = { "a" }, PreserveFormatting = true });
        }
        catch (Exception e)
        {
                Console.WriteLine($"Error: {e}");
                throw;
        }
        foreach (var tr in translatedTexts)
        {
            Console.WriteLine(tr.Text);
        }
}

The error you get basically means that the library receives invalid JSON from the API (it should be a string which contains XML/HTML, but instead is just raw <... ) - is there any way you can log the HTTP requests/responses you make/receive, or get an input that reproducibly causes this? Else Im just shooting in the dark.

BenceBordas commented 1 year ago

Hey there! Thanks for the answer again! I managed to look into what we send and what we get back!

Before we get the error we send something like this and its result is 303:

POST https://api.deepl.com/v2/translate HTTP/1.1 User-Agent: deepl-dotnet/1.7.1 (Microsoft Windows 10.0.19045 ) dotnet-clr/4.0.30319.42000 Authorization: DeepL-Auth-Key OURKEY Content-Type: application/x-www-form-urlencoded Host: api.deepl.com Content-Length: 973 Expect: 100-continue target_lang=hu&source_lang=en&preserve_formatting=1&tag_handling=xml&ignore_tags=a&text=TEXT

What we get back with a result 200 is:

HTTP/1.1 200 OK Date: Fri, 23 Jun 2023 09:50:43 GMT Server: CPWS Strict-Transport-Security: max-age=31536000; includeSubDomains X-UA-Compatible: IE=edge Expires: Thu, 19 Nov 1981 08:52:00 GMT Cache-Control: no-store, no-cache, must-revalidate Pragma: no-cache Transfer-Encoding: chunked Content-Type: text/html;charset=UTF-8 1a9b6 \<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

And there is a HUGE ton of characters left, around 108k characters if you need that I can upload it here, tho I don't know if it might contain anything sensitive so if you need it let me know and will look into it and post it as soon as I can!

Hope it helps! Thanks for the help!

BenceBordas commented 1 year ago

@JanEbbing After a bit of "investigation" I've figured that the response we get (Which starts with '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">' ) is some sort of error reporting page

I tried to look around even tried to ask Chat bots, but could not find anything about it, do you guys have any idea what it could be?

JanEbbing commented 1 year ago

What kind of error page is this? Is it from CloudFlare, or maybe something on your side (Proxy, firewall, ...)? It being some abuse protection mechanism would explain why it only appears sometimes, and why the API request returns a HTML page instead of the correct API response.

BenceBordas commented 1 year ago

Hey there! I copied the response we get to this text file: Error_Response.txt

Proxy, Firewall problem is a good idea! Gonna ask around our IT department about it, maybe they can see something!

JanEbbing commented 1 year ago

Could it be from this: UserCheck ? I'm not aware DeepL uses this, so it might be on your side, the WebApp seems to call itself UserCheck and this was the main thing google could find. (There are also a lot of PORTAL_IS strings, but I don't think they mean anything)

BenceBordas commented 1 year ago

Hey! Seems like you were right about the firewall! Talked to our IT guys and seems like we had to add an "application site category" and had to allow it(?) It was not allowed in the firewall on the application layer. To be honest our IT guy talked about it briefly and I don not have too much knowledge in this specific area so I hope it says something to you! For us this will be a solution, so I guess this issue is solved!