Open-EO / openeo-python-client

Python client API for OpenEO
https://open-eo.github.io/openeo-python-client/
Apache License 2.0
156 stars 42 forks source link

initial PoC implementation of UDPJobFactory #644

Closed soxofaan closed 1 month ago

soxofaan commented 1 month ago

for issue #604

self-review PR

soxofaan commented 1 month ago

Ok I think this PR is ready for final review and/or merging.

the diff is quite large, but that is mainly because of testing and testing utility improvements.

The core of this PR is the new UDPJobFactory class at https://github.com/Open-EO/openeo-python-client/blob/2667733563e9eeac00879b0d316d858b4c2dcb5a/openeo/extra/job_management.py#L941-L1126

user-facing usage (as documented):

        from openeo.extra.job_management import (
            MultiBackendJobManager,
            create_job_db,
            UDPJobFactory,
        )

        # Job creator, based on a parameterized openEO process definition
        job_starter = UDPJobFactory(
            namespace="https://example.com/my_process.json",
        )

        # Initialize job database from dataframe, with parameters to use.
        df = pd.DataFrame(...)
        job_db = create_job_db("jobs.csv").initialize_from_df(df)

        # Create and run job manager
        job_manager = MultiBackendJobManager(...)
        job_manager.run_jobs(job_db=job_db, start_job=job_starter)
soxofaan commented 1 month ago

Also note that this is inspired by PR https://github.com/ESA-APEx/esa-apex-toolbox-python/pull/1 as suggested in https://github.com/Open-EO/openeo-python-client/issues/604

That PR https://github.com/ESA-APEx/esa-apex-toolbox-python/pull/1 was however outdated due to recent job manager changes and not really mergeable as-is. I also explicitly choose an approach that avoids subclassing MultiBackendJobManager and found a way to implement it as decoupled component (a callable that can be passed as start_job to run_jobs), which should make it more reusable and more robust against future job manager changes

soxofaan commented 1 month ago

Just found out there was actually another PR about this:

interesting idea from that PR to add here:

HansVRP commented 1 month ago

Will review this sprint. Any deadline I need to be aware of?

soxofaan commented 1 month ago

Thanks a lot for the review notes! I worked some more on the documentation part to clarify some things.

Regardless, I'd propose to already merge this, because it is a large PR in terms of lines of code, so it's easy to get conflicts with other PRs. Functionality-wise it just adds something new, which I made sure to flag clearly as experimental, to give us some wiggle room to clean up later