Make GFMAP Job Manager more robust and persistant

GriffinBabe commented 2 months ago

It can happen that the GFMAPJobManager crashes. Not necessarily due to errors on gfmap side, but also from bad user code in post-job actions.

[x] Implement the possibility of re-running jobs that previously failed. Could be a parameter of the Job manager when running.
[x] Re-run failed post-job actions. This could be done by setting up the job statuses to an intermediate value "post-processing" before setting it up to finished at the end of the post-job action. This however can enter in conflict with the MultiBackendJobManager behavior.
[x] There is also the issue that when running an extraction on the same destination folder, the STAC catalogue is being overwritten instead of being extended #94

At the moment, persistence is done through the job_tracking.csv file and the base logic in the MultiBackendJobManager https://github.com/Open-EO/openeo-python-client/blob/master/openeo/extra/job_management.py#L32

GriffinBabe commented 2 months ago

Whenever a crash happens from the user-code, the GFMAP manager loses it's stac collection progress as it is only written whenever the manager finishes it's jobs.

One temporary way of tackling that would be to simply add a try/except clause as such:

try:
    manager.run_jobs(job_df, create_datacube_optical, tracking_df_path)
except Exception as e:
    _pipeline_log.error("Error during the job execution: %s", e)
finally:
    manager.create_stac(constellation='sentinel2', item_assets={"auxiliary": AUXILIARY})

This should in-theory save only fully initialized STAC items (crashing points should be considered from the output_path_gen, post_job_action, create_job user-functions, all of which are called before adding any item to the collection):

self._root_collection.add_items(job_items)

@VincentVerelst However I was thinking that it would be maybe better to call create_stac function automatically within the manager, so that STAC is automatically handled during a crash. The usage of a job manager could look like this:

manager = GFMAPJobManager(...)
manager.setup_stac(constellation='sentinel2', item_assets={'auxiliary': AUXILIARY})

manager.run_jobs(...)  # Will can _create_stac internally

Tell me what do you think 😄

VincentVerelst commented 2 months ago

@GriffinBabe, sounds like a good idea! I don't see any benefit in the user having to call create_stac themselves. Also like the idea of having a setup_stac. Maybe we can also make this one optional? i.e. only if the user is interested in changing the STAC metadata, they need to call it, otherwise GFMap will generate a default STAC collection based on which constellation is selected.

Open-EO / openeo-gfmap

Make GFMAP Job Manager more robust and persistant #96