Open zvrba opened 4 years ago
Hi @zvrba For the error of "The client could not finish the operation within specified timeout", this error is usually reported when DMLib encounters temporary network, and the request cannot be completed in 15minutes.
DMLib supports to resume the transfer job from last checkpoint, it should be able to complete the remaining transferring with a resuming.
TransferContext instance is used to save and pass checkpoint. You can find sample code on how to use TransferContext to resume a transfer job here: https://github.com/Azure/azure-storage-net-data-movement/blob/master/samples/DataMovementSamples/DataMovementSamples/Samples.cs#L151
The sample code shows how to cancel and resume a transfer job. Actually, resuming also works for other exceptions. You can try to resume like samples here if there's error happened: https://github.com/Azure/azure-storage-net-data-movement/blob/master/samples/DataMovementSamples/DataMovementSamples/Samples.cs#L182
Thanks Emma
this error is usually reported when DMLib encounters temporary network, and the request cannot be completed in 15minutes.
What do you mean by "the" request? Related to that:
@zvrba
Thanks Emma
Hi,
I have downloaded the latest source and built it so that I can observe what is happening. During a transfer, the following happens: Exception thrown: 'System.IO.IOException' in System.Net.Security.dll Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
After that, the download simply hangs. If you look at the screenshot:
This seems to be likely a bug in error handling in DMLib. At the very least, the above exception should be treated as fatal error and transfer should be aborted immediately instead of waiting for the timeout to elapse. Otherwise, it'd be nice that the library re-tried the request a couple of times as this seems to be BLOB service problem instead of network connectivity problem.
Hi @zvrba ,
Thanks a lot for the detailed investigation.
About the issue of waiting util timeout when encounter the exception 'System.IO.IOException' in System.Net.Security.dll, this should be an error handling issue. The fix may need some change in DMLib's dependency. We'll need to figure out a valid fix for it.
About the spinning in TransferScheduler. When all transfer job are completed, TransferScheduler would stop spinning. Code is here. It will wait on a blocking collection which would not use CPU when empty. After the 1# issue is fixed, the spinning issue in TransferScheduler would also be mitigated.
Thanks Emma
Which service(blob, file) does this issue concern?
BLOB storage (block blobs)
Which version of the SDK was used?
1.2.0 (latest) from NuGet
On which platform were you using? (.Net Framework version or .Net Core version, and OS version)
NetCore 2.2, Windows 10
How can the problem be reproduced? It'd be better if the code caused the problem can be shared.
This code sometimes just hangs with no visible download progress (monitoring network traffic in task manager).
This is a command-line application, so there's no
SynchronizationContext
that could cause a deadlock. After a long period of time, the exception shown in the stack trace screenshot is thrown. There are no network connectivity problems.What problem was encountered?
Download sometimes seems to get deadlocked or some event is missed. The files in question aren't even large (5-20MB), but I'm downloading thousands of them, one after another (i.e., there are no concurrent downloads -- next download starts after the previous one is finished). See the stack traces and thrown exceptions below.
Have you found a mitigation/solution?
No.