Closed bossie closed 8 months ago
FYI: related to (and maybe even duplicate of) #498
async_task
is only aware of ZkJobRegistry
.
In the case of SHub batch processes that's just fine because those weren't/aren't available on CDSE SHub endpoint https://sh.dataspace.copernicus.eu and won't be considered:
load_stac
of partial job results only works on Terrascope anyway so this changes nothing.
FYI: I've also been tagging some implementation parts with todo notes referencing #498
$ grep '#498' openeogeotrellis/ -r -n
openeogeotrellis/job_registry.py:797: # TODO #236/#498 For now: compare job metadata between Zk and EJR
openeogeotrellis/job_registry.py:910: # TODO #236/#498: error if both sources failed?
openeogeotrellis/job_registry.py:923: # TODO #236/#498 Need to have EJR implementation for this? This is only necessary for ZK cleaner script anyway.
openeogeotrellis/job_tracker_v2.py:395: # TODO: #236/#498 also/instead get jobs_to_track from EJR?
openeogeotrellis/backend.py:334: # TODO #236/#498 avoid this fallback and just make sure it is always set when necessary
openeogeotrellis/backend.py:1640: zk_job_registry_factory=ZkJobRegistry, # TODO #236/#498 allow to disable this with config?
openeogeotrellis/backend.py:1984: # TODO #498 eliminate ZK code path, or at least encapsulate this logic better
Heh. Persisting usage in ES (#488) doesn't work. From the YARN job_tracker logs:
In context "job_tracker job_metadata.status='finished' from YarnStatusGetter": caught EjrHttpError('EJR API error: 400 \'Bad Request\' on
PATCH \'https://jobregistry.openeo.vito.be/jobs/j-240105907a9640a79375dbcdabe66505\'
: {"statusCode":400,"message":["property costs should not exist"],"error":"Bad Request"}')
Traceback (most recent call last):
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/util/logging.py", line 371, in just_log_exceptions
yield
File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/job_tracker_v2.py", line 512, in _sync_job_status
self._elastic_job_registry.set_usage(job_id, job_costs, dict(usage))
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 465, in set_usage
return self._update(
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 442, in _update
return self._do_request("PATCH", f"/jobs/{job_id}", json=data)
File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/jobregistry.py", line 288, in _do_request
raise EjrHttpError.from_response(response=response)
openeo_driver.jobregistry.EjrHttpError: EJR API error: 400 'Bad Request' on `PATCH 'https://jobregistry.openeo.vito.be/jobs/j-240105907a9640a79375dbcdabe66505'`: {"statusCode":400,"message":["property costs should not exist"],"error":"Bad Request"}
Was a bit hard to find because there's no job_id
in the log entry.
so the "costs" field must be supported in EJR (https://github.com/Open-EO/openeo-job-registry-elastic-api) this is something @JanssenBrm or hist team has to add
@soxofaan was the property result_metadata
(actually: results_metadata) mentioned in https://github.com/Open-EO/openeo-job-registry-elastic-api/issues/2#issuecomment-1351354363 ever used?
Because this seems to assume that all properties are top level:
no indeed, I don't think results_metadata
is already being used (writing nor reading)
Is it possible to create an issue on https://github.com/Open-EO/openeo-job-registry-elastic-api with clear instructions on what should be done?
Ugh Github, stop closing issues upon merging one of several PRs.
Still need to get the jobs to be tracked from EJR instead of ZK (context: https://github.com/Open-EO/openeo-geopyspark-driver/pull/638).
Potential quickfix https://github.com/Open-EO/openeo-geopyspark-driver/pull/639 was introduced in the meanwhile.
Disabled ZkJobRegistry
on CDSE dev/staging; seems to work after a force_redeploy
.
seems to work
*sometimes.
Not production ready because of https://github.com/Open-EO/openeo-job-registry-elastic-api/issues/32.
ES mapping of dependencies
and results_metadata
should be changed (to flattened
?) to avoid a mapping explosion.
This should also fix integration tests test_advanced_cloud_masking_diy
and test_load_collection_references_correct_batch_process_id
.
Rolled back on CDSE dev/staging because of https://github.com/Open-EO/openeo-job-registry-elastic-api/issues/33 (#660).
Context: ZK on CDSE crashes due to OOM.