Open-EO / openeo-python-client

Python client API for OpenEO
https://open-eo.github.io/openeo-python-client/
Apache License 2.0
156 stars 42 forks source link

IncompleteRead exception crashes the JobManager #601

Open VictorVerhaert opened 3 months ago

VictorVerhaert commented 3 months ago

While running long jobs using the JobManager, it crashes while trying to download results.

Traceback (most recent call last):
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 748, in _error_catcher
    yield
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 894, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(1293825744 bytes read, 627790347 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/requests/models.py", line 820, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 1060, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 977, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 872, in _raw_read
    with self._error_catcher():
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/urllib3/response.py", line 772, in _error_catcher
    raise ProtocolError(arg, e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(1293825744 bytes read, 627790347 more expected)', IncompleteRead(1293825744 bytes read, 627790347 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/victor.verhaert/LCFM/lcfm-production/notebooks/JM-LCFM.py", line 137, in <module>
    job_manager.run_jobs(
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/extra/job_management.py", line 273, in run_jobs
    self._update_statuses(df)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/extra/job_management.py", line 433, in _update_statuses
    self.on_job_done(the_job, df.loc[i])
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/extra/job_management.py", line 373, in on_job_done
    job.get_results().download_files(target=job_dir)
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/rest/job.py", line 502, in download_files
    downloaded = [a.download(target) for a in self.get_assets()]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/rest/job.py", line 502, in <listcomp>
    downloaded = [a.download(target) for a in self.get_assets()]
                  ^^^^^^^^^^^^^^^^^^
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/openeo/rest/job.py", line 378, in download
    for block in response.iter_content(chunk_size=chunk_size):
  File "/home/victor.verhaert/LCFM/lcfm-production/.conda/lib/python3.11/site-packages/requests/models.py", line 822, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(1293825744 bytes read, 627790347 more expected)', IncompleteRead(1293825744 bytes read, 627790347 more expected))

We need to make the job manager more robust to these type of exceptions

soxofaan commented 3 months ago

Do you have an idea if that ChunkedEncodingError is just a temp glitch or can you reproduce that failure each time you try to (manually) download the result assets?

soxofaan commented 3 months ago

We need to make the job manager more robust to these type of exceptions

The question is what can be done better purely at the level of python client implementation.

Skipping the failure with a warning is tempting, but that might not be better (as a default behavior) because the end user might easily overlook that and get wrong impression that everything went fine.

An alternative simple improvement that could help here is add an option to not automatically download results of jobs