apache / libcloud

Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API.
https://libcloud.apache.org
Apache License 2.0
2.04k stars 925 forks source link

Upload/Download bug with Azure Blobs #1475

Open romaintha opened 4 years ago

romaintha commented 4 years ago

Summary

We are facing some issues interacting with Azure Blobs. Whether it is for uploading or downloading, it happens quite often that we get an error like below. This problem frequency increases dramatically with the size of the file we are interacting. For a large file ~100Go, it is almost impossible to run anything without encountering this. Is it something known?

Detailed Information

Libcloud version: apache-libcloud==3.1.0 python version: 3.7.4 OS: Ubuntu Within the dask docker image

Stacktrace:

`[2020-07-17 12:25:46] ERROR - prefect.TaskRunner | Unexpected error: ChunkedEncodingError(ProtocolError('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))) Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 297, in recv_into return self.connection.recv_into(*args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1822, in recv_into self._raise_ssl_error(self._ssl, result) File "/opt/conda/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1639, in _raise_ssl_error raise SysCallError(errno, errorcode.get(errno)) OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/urllib3/response.py", line 360, in _error_catcher yield File "/opt/conda/lib/python3.7/site-packages/urllib3/response.py", line 442, in read data = self._fp.read(amt) File "/opt/conda/lib/python3.7/http/client.py", line 457, in read n = self.readinto(b) File "/opt/conda/lib/python3.7/http/client.py", line 501, in readinto n = self.fp.readinto(b) File "/opt/conda/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) File "/opt/conda/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 302, in recv_into raise SocketError(str(e)) OSError: (104, 'ECONNRESET')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/requests/models.py", line 750, in generate for chunk in self.raw.stream(chunk_size, decode_content=True): File "/opt/conda/lib/python3.7/site-packages/urllib3/response.py", line 494, in stream data = self.read(amt=amt, decode_content=decode_content) File "/opt/conda/lib/python3.7/site-packages/urllib3/response.py", line 459, in read raise IncompleteRead(self._fp_bytes_read, self.length_remaining) File "/opt/conda/lib/python3.7/contextlib.py", line 130, in exit self.gen.throw(type, value, traceback) File "/opt/conda/lib/python3.7/site-packages/urllib3/response.py", line 378, in _error_catcher raise ProtocolError('Connection broken: %r' % e, e) urllib3.exceptions.ProtocolError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/prefect/engine/runner.py", line 48, in inner new_state = method(self, state, args, kwargs) File "/opt/conda/lib/python3.7/site-packages/prefect/engine/task_runner.py", line 943, in get_task_run_state self.task.run, timeout=self.task.timeout, raw_inputs File "/opt/conda/lib/python3.7/site-packages/prefect/utilities/executors.py", line 182, in timeout_handler return fn(args, *kwargs) File "xxxxx, line 127, in run video_file.name File "xxxxx", line 116, in xxxx for k, line in enumerate(xxxx): File "/opt/conda/lib/python3.7/site-packages/libcloud/utils/files.py", line 69, in read_in_chunks chunk = b(get_data(args)) File "/opt/conda/lib/python3.7/site-packages/requests/models.py", line 753, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))`

c-w commented 4 years ago

Thanks for the report. This looks like an upstream issue in urllib3 (see https://github.com/urllib3/urllib3/issues/367).

@Kami Given that this issue likely affects all drivers and not just the Azure Blobs one, do we want to add global retry handlers to the storage drivers that support chunked transfer?

Kami commented 4 years ago

I'm don't know too much about this specific error, but it does look like it's network related.

@c-w There were some discussions about common retrying mechanism in Libcloud, but it was never implement.

I'm fine with adding support for retrying on non-fatal errors (e.g. intermittent network or API issues) to download and upload methods which support chunked transfers (aka where we upload multiple small chunks).

We just need to be careful we only retry on actual intermittent and non-fatal errors.

stale[bot] commented 3 years ago

Thanks for contributing to this issue. As it has been 90 days since the last activity, we are automatically marking is as stale. If this issue is not relevant or applicable anymore (problem has been fixed in a new version or similar), please close the issue or let us know so we can close it. On the contrary, if the issue is still relevant, there is nothing you need to do, but if you have any additional details or context which would help us when working on this issue, please include it as a comment to this issue.