ImperialCollegeLondon / Faraday-liionsden

BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Azure truncating data stream #161

Closed dandavies99 closed 1 year ago

dandavies99 commented 2 years ago

Problem

Intermittent error whereby, during parsing of a file, the following exception is raised. 8388608 is always the number of bytes which is 8MB, so it seems this is sometimes being limited to 8MB. I found one useful-looking example on the internet of a similar grumble, but not a very satisfactory answer. Any ideas @cc-a? Going to start marking the Azure-related issues with a specific label.

Steps to reproduce

Upload and parse a biologic file that is larger than 8MB (attached). partial_NDK01-26 - take 2 - 0,01C cycle on fresh cell_02_BCD_CB1 (1).csv

Expected outcome

Parses the file and doesn't complain. This is sometimes the outcome.

Actual outcome

web_1  | ERROR 2022-09-13 13:28:40,632 Internal Server Error: /battDB/exps/add_data/1/
web_1  | Traceback (most recent call last):
web_1  |   File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 443, in _error_catcher
web_1  |     yield
web_1  |   File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 592, in read
web_1  |     raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
web_1  | urllib3.exceptions.IncompleteRead: IncompleteRead(8388608 bytes read, 23233539 more expected)
web_1  | 
web_1  | During handling of the above exception, another exception occurred:
web_1  | 
web_1  | Traceback (most recent call last):
web_1  |   File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 816, in generate
web_1  |     yield from self.raw.stream(chunk_size, decode_content=True)
web_1  |   File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 627, in stream
web_1  |     data = self.read(amt=amt, decode_content=decode_content)
web_1  |   File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 592, in read
web_1  |     raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
web_1  |   File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
web_1  |     self.gen.throw(type, value, traceback)
web_1  |   File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 460, in _error_catcher
web_1  |     raise ProtocolError("Connection broken: %r" % e, e)
web_1  | urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(8388608 bytes read, 23233539 more expected)', IncompleteRead(8388608 bytes read, 23233539 more expected))
web_1  | 
web_1  | During handling of the above exception, another exception occurred:
web_1  | 
web_1  | Traceback (most recent call last):
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 169, in __next__
web_1  |     chunk = next(self.iter_content_func)
web_1  |   File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 818, in generate
web_1  |     raise ChunkedEncodingError(e)
web_1  | requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(8388608 bytes read, 23233539 more expected)', IncompleteRead(8388608 bytes read, 23233539 more expected))
web_1  | 
web_1  | During handling of the above exception, another exception occurred:
web_1  | 
web_1  | Traceback (most recent call last):
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
web_1  |     response = get_response(request)
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
web_1  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/views/generic/base.py", line 103, in view
web_1  |     return self.dispatch(request, *args, **kwargs)
web_1  |   File "/usr/local/lib/python3.8/site-packages/guardian/mixins.py", line 213, in dispatch
web_1  |     return super().dispatch(request, *args, **kwargs)
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/views/generic/base.py", line 142, in dispatch
web_1  |     return handler(request, *args, **kwargs)
web_1  |   File "/usr/src/app/battDB/views.py", line 162, in post
web_1  |     form.instance.full_clean()
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/db/models/base.py", line 1471, in full_clean
web_1  |     self.clean()
web_1  |   File "/usr/src/app/battDB/models.py", line 578, in clean
web_1  |     parsed_file = parse_data_file(file_obj, file_format, columns=cols)
web_1  |   File "/usr/src/app/parsing_engines/parsing_engines_base.py", line 241, in parse_data_file
web_1  |     engine = get_parsing_engine(file_format).factory(file_obj)
web_1  |   File "/usr/src/app/parsing_engines/biologic_engine.py", line 47, in factory
web_1  |     skip_rows = get_header_size(file_obj, cls.encoding)
web_1  |   File "/usr/src/app/parsing_engines/biologic_engine.py", line 91, in get_header_size
web_1  |     file_obj.seek(0)
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/core/files/utils.py", line 46, in <lambda>
web_1  |     seek = property(lambda self: self.file.seek)
web_1  |   File "/usr/local/lib/python3.8/site-packages/django/core/files/utils.py", line 46, in <lambda>
web_1  |     seek = property(lambda self: self.file.seek)
web_1  |   File "/usr/local/lib/python3.8/site-packages/storages/backends/azure_storage.py", line 46, in _get_file
web_1  |     download_stream = self._storage.client.download_blob(
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
web_1  |     return func(*args, **kwargs)
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_container_client.py", line 1146, in download_blob
web_1  |     return blob_client.download_blob(offset=offset, length=length, **kwargs)
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
web_1  |     return func(*args, **kwargs)
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_blob_client.py", line 871, in download_blob
web_1  |     return StorageStreamDownloader(**options)
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_download.py", line 359, in __init__
web_1  |     self._response = self._initial_request()
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_download.py", line 463, in _initial_request
web_1  |     self._current_content = process_content(
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/storage/blob/_download.py", line 46, in process_content
web_1  |     content = b"".join(list(data))
web_1  |   File "/usr/local/lib/python3.8/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 185, in __next__
web_1  |     raise IncompleteReadError(err, error=err)
web_1  | azure.core.exceptions.IncompleteReadError: ('Connection broken: IncompleteRead(8388608 bytes read, 23233539 more expected)', IncompleteRead(8388608 bytes read, 23233539 more expected))
dandavies99 commented 1 year ago

@CWestICL - have you seen this specific IncompleteReadError at all while testing different files? If not, I'll close this issue for now as I think the problem has gone away.