DeepLcom / deepl-dotnet

Official .NET library for the DeepL language translation API.
MIT License
169 stars 23 forks source link

inconsistent no-reply with PDF translation + workaround #14

Open boeryepes opened 2 years ago

boeryepes commented 2 years ago

hi, to translate documents I use in my code the Await Translator.TranslateDocumentAsync (...) task from your API. This works flawless for DOCX and PPTX but on PDF files it is inconsistent. Regularly there is no completion of the translation task. It is not reproducible and I found no relation to file size, content, name etc.

I then checked the implementation of the task in the DeepL .NET API VS solution and it is this:

        handle = await TranslateDocumentUploadAsync(...).ConfigureAwait(false);
        await TranslateDocumentWaitUntilDoneAsync(handle.Value, cancellationToken).ConfigureAwait(false);
        await TranslateDocumentDownloadAsync(handle.Value, outputFile, cancellationToken).ConfigureAwait(false);

So I replaced Await Translator.TranslateDocumentAsync () with the above, as follows.

Dim docHandle As Model.DocumentHandle = Await _translator.TranslateDocumentUploadAsync(...)
Await _translator.TranslateDocumentWaitUntilDoneAsync(docHandle)
Await _translator.TranslateDocumentDownloadAsync(docHandle, outputFileInfo)

Using these 3 calls separately always work with PDFs and the other file types so I wonder why the single API call 'TranslateDocumentAsync' does not always work for PDFs. The only difference is that I do not use .ConfigureAwait(false) because based on the documentation and other developers, it is not needed.

Can you advise what is going on?

dieter-enns-deepl commented 2 years ago

Hi,

first I would like to explain why we are using ConfigureAwait(false). It is the recommended way for avoiding the potential of deadlocks in libraries. Functionally, ConfigureAwait(false) should just allow for continuations (code after the await) to be executed from another synchronization context than the one which called the async function in the first place. So on paper, using ConfigureAwait(false) should be safer than leaving it out. If you are interested to learn more about this topic, I recommend you following article which explains it in more details and better than I ever could: https://devblogs.microsoft.com/dotnet/configureawait-faq/#why-would-i-want-to-use-configureawaitfalse

Another thing raises even more questions to me. The three asynchronous functions which you started to call on your own contain awaited calls with ConfigureAwait(false) themselves:

I've tried to reproduce the issue and came up with (I am no native VB.NET programmer, so please bear with me):

Imports System.IO
Imports DeepL

Module Module1
  Sub Main()
    Translate().Wait() ' Blocking call because I don't know how to do an async Main in VB.Net
  End Sub

  Private Async Function Translate() As Task
    Dim translator = new Translator("[Private]")
    Await translator.TranslateDocumentAsync(
      New FileInfo("C:\[Private].pdf"),
      New FileInfo("C:\[Private]_translated.pdf"),
      "de",
      "en-US").ConfigureAwait(false) ' I also tried without ".ConfigureAwait(false)" with same result
  End Function
End Module

I couldn't reproduce the issue. But it took quite a while until completion. I would need to ask a colleague if that matters, but I suspect that processing a PDF takes more time compared to our other supported file formats. But of course I also don't know how long you have waited.

In conclusion, unfortunately, I can't tell you the reasons why it won't work. I have the suspicion that it coincidentally looks as if .ConfigureAwait(false) is the influencing factor. However, I have yet no clue what other reason might cause the faulty behavior in your case. Sorry.

If that would help you, we could extend our library to be configurable by the TranslatorOptions of whether false or true should be passed to .ConfigureAwait(bool) with false as default for backward compatibility. Functionally, .ConfigureAwait(true) should behave neutral and as if .ConfigureAwait(bool) wasn't used in the first place.

boeryepes commented 2 years ago

Thanks Dieter,

indeed translating PDFs takes time - in all cases I waited long enough (>10 minutes where usually 3-4 minutes in case of a successful translation). Note that: - I am translating multi-threaded: all documents in a folder are uploaded, translated and downloaded in separate threads.

Note that I cannot test to much because my quote gets depleted to quickly (50k characters per document).

regards, klaas

Op 29-03-2022 18:55 schreef dieter-enns-deepl @.***>:

Hi,

first I would like to explain why we are using ConfigureAwait(false). It is the recommended way for avoiding the potential of deadlocks in libraries. Functionally, ConfigureAwait(false) should just allow for continuations (code after the await) to be executed from another synchronization context than the one which called the async function in the first place. So on paper, using ConfigureAwait(false) should be safer than leaving it out. If you are interested to learn more about this topic, I recommend you following article which explains it in more details and better than I ever could: https://devblogs.microsoft.com/dotnet/configureawait-faq/#why-would-i-want-to-use-configureawaitfalse

Another thing raises even more questions to me. The three asynchronous functions which you started to call on your own contain awaited calls with ConfigureAwait(false) themselves:

In conclusion, unfortunately, I can't tell you the reasons why it won't work. I have the suspicion that it coincidentally looks as if .ConfigureAwait(false) is the influencing factor. However, I have yet no clue what other reason might cause the faulty behavior in your case. Sorry.

If that would help you, we could extend our library to be configurable by the TranslatorOptions of whether false or true should be passed to .ConfigureAwait(bool) with false as default for backward compatibility. Functionally, .ConfigureAwait(true) should behave neutral and as if .ConfigureAwait(bool) wasn't used in the first place.

— Reply to this email directly, view it on GitHub https://github.com/DeepLcom/deepl-dotnet/issues/14#issuecomment-1082126624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXR5UCQXFXLVLRG5ZRG6WILVCMYXDANCNFSM5RRK6PQQ. You are receiving this because you authored the thread. https://github.com/notifications/beacon/AXR5UCUWDWCK4GI4YBINGQTVCMYXDA5CNFSM5RRK6PQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIB77CIA.gifMessage ID: @.***>

daniel-jones-deepl commented 2 years ago

Hi Klaas,

Thank you for testing the issue on your side. Regarding the quota, we can cancel those document translations from your quota. To do so we need to identify your account, could you please contact our support team?

boeryepes commented 2 years ago

Hi Daniel, that would be great. It’s this email address I am mailing from @. @.> )

Regards, Klaas

From: Daniel Jones @.> Sent: Monday, 4 April 2022 15:37 To: DeepLcom/deepl-dotnet @.> Cc: boeryepes @.>; Author @.> Subject: Re: [DeepLcom/deepl-dotnet] inconsistent no-reply with PDF translation + workaround (Issue #14)

Hi Klaas,

Thank you for testing the issue on your side. Regarding the quota, we can cancel those document translations from your quota. To do so we need to identify your account, could you please contact our support team https://support.deepl.com/hc/en-us/requests/new ?

— Reply to this email directly, view it on GitHub https://github.com/DeepLcom/deepl-dotnet/issues/14#issuecomment-1087567335 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AXR5UCXES6Z6D7BIFO5KFHTVDLV6VANCNFSM5RRK6PQQ . You are receiving this because you authored the thread. https://github.com/notifications/beacon/AXR5UCT42QNKWHL7ZOK7H2TVDLV6VA5CNFSM5RRK6PQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIDJPLZY.gif Message ID: @. @.> >

daniel-jones-deepl commented 2 years ago

Hi Klaas, Your email address is censored in your messages; I guess that GitHub censors your email address when you reply via email. Regards, Daniel

boeryepes commented 2 years ago

it's @.**@.

Op 05-04-2022 15:45 schreef Daniel Jones @.***>:

Hi Klaas, Your email address is censored in your messages; I guess that GitHub censors your email address when you reply via email. Regards, Daniel

— Reply to this email directly, view it on GitHub https://github.com/DeepLcom/deepl-dotnet/issues/14#issuecomment-1088723492, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXR5UCXLZ2ATLYUFOLFTCJ3VDQ7YZANCNFSM5RRK6PQQ. You are receiving this because you authored the thread. https://github.com/notifications/beacon/AXR5UCVI72DZTRKRYCEFU6TVDQ7YZA5CNFSM5RRK6PQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOIDSJUJA.gifMessage ID: @.***>

daniel-jones-deepl commented 2 years ago

Hi Klaas, GitHub censors your email, all I see is it's ***@***.******@***.***. You can see it online on GitHub here: https://github.com/DeepLcom/deepl-dotnet/issues/14#issuecomment-1091200870

Please send an email to open-source@deepl.com, so I can use your email address to identify your DeepL account. Regards, Daniel