Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

load_stac(unsigned_job_results_url) in a batch job fails #644

Closed bossie closed 8 months ago

bossie commented 8 months ago

A batch job that load_stac's an unsigned job URL fails upon starting the job:

OpenEoApiError: [500] Internal: Server error: HTTPError('401 Client Error: Unauthorized for url: https://openeo.vito.be/openeo/1.2/jobs/j-240118d8a24b405bb6d63409f0378ad7/results') (ref: r-240119f757cf4970a380ec43e009659f)

Looks like partial job results (#489) introduced this bug: the unsigned URL is also used to fetch the openeo:status of the job results and , unlike the load_stac process itself, this code path does not handle unsigned URLs.

From the driver logs:

HTTPError('401 Client Error: Unauthorized for url: https://openeo.vito.be/openeo/1.2/jobs/j-240118d8a24b405bb6d63409f0378ad7/results')

Traceback (most recent call last):
File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/opt/venv/lib64/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated
return f(*args, **kwargs)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/views.py", line 902, in queue_job
backend_implementation.batch_jobs.start_job(job_id=job_id, user=user)
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1966, in start_job
self._start_job(job_id, user, _get_vault_token)
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 2018, in _start_job
job_dependencies = self._schedule_and_get_dependencies(
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 2858, in _schedule_and_get_dependencies
resp.raise_for_status()
File "/opt/venv/lib64/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://openeo.vito.be/openeo/1.2/jobs/j-240118d8a24b405bb6d63409f0378ad7/results

load_stac(https://openeo.vito.be/openeo/1.2/jobs/j-240118d8a24b405bb6d63409f0378ad7/results): extract "openeo:status": fail 2024-01-19 10:10:00.333632, elapsed 0:00:00.051592

The reason why this case was not caught by the integration tests is: it does the load_stac in a sync request, and querying openeo:status is a batch job feature.

bossie commented 8 months ago

Available on https://openeo-dev.vito.be.

bossie commented 8 months ago

Verified that loading partial job results (#489) still works.