Open mattiasb opened 7 months ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.
Thanks for the feedback @mattiasb . We will investigate and get back to you asap.
Hi @mattiasb, sorry for the delay. I believe what you are encountering is that there is actually a separate (naive and undocumented) retry policy for larger downloads that use response streaming. In the internal download code there is a retry loop when trying to stream the response body of each chunk. This retry policy has its own retry count and does not interact with the global retry policy. From the stacktrace in your logs, you can see the error is coming from here.
This retry policy is a side-effect of the way response streaming in handled by azure-core
and the way the SDK is setup to download the blob in chunks. Streaming the body from the server comes after the initial HTTP response and therefore is not run through the normal transport pipeline. Without this, any error while streaming would cause the whole download to fail.
So, what is likely happening is you are exhausting this retry policy and the failure is bubbling up but is not an error that is handled by the global retry policy and therefore it is possible your request can fail without all the retry attempts being exhausted.
We realize this is not the most ideal implementation, especially since the internal retry count cannot be adjusted and we would like to improve this in the future, but for now this is the likely cause for what you are seeing. In real-world applications, we have found this policy to be effective in mitigating random network interruptions so you may not run into any issues in a production environment. Or for now, you can also consider implementing your own retry over the download call.
Or for now, you can also consider implementing your own retry over the download call.
As a data point, this is what we have done as we do experience issues in production code.
Describe the bug
When experiencing timeouts while downloading a blob with Azure Python SDK the whole download will fail when the network is brought back to normal even if there are retries left for the
retry_policy
.I have not been able to reproduce this with uploads.
To Reproduce
See https://github.com/mattiasb/blob-issue for more details on how to reproduce (including an example script).
Steps to reproduce the behavior:
Expected behavior
I would expect the retry policy to trigger also on timeouts, such that when I returned the network to normal a later set of retries would let the script finish downloading the file.
Additional context
Again. The best is to see what I do at https://github.com/mattiasb/blob-issue. But I recorded some logs as well which might be worth looking at.