Dynamic Process function generation

bgoesswe commented 5 years ago

Since the back ends may be capable of a different amount of processes and they can be retrieved by the GET /processes end point, it would be a major improvement to generate the process functions dynamically when a back end provider is chosen.
e.g.: https://stackoverflow.com/questions/23812760/dynamic-functions-creation-from-json-python

It is at least something I want to look into.

lforesta commented 5 years ago

This is an old issue, but it's becoming more important as time passes since back-ends have more functionality now. And it seems to me also non-custom processes are missing in the client, e.g. linear_scale_range was not there #96 , does the current client need a one-to-one mapping of the processes defined in openEO?

soxofaan commented 4 years ago

I think there are 2 separate aspects to this:

make sure all "official" processes from the openEO API are supported by the python client
detect that a backend does not support a process and fail early in the client instead of waiting for the backend to fail

Concerning 1.: I'm not a big of dynamic generation of functions/methods as this breaks some features that are important for the end user: normal discovery and documentation of methods (by looking at the source code of ImageCollectionClient, or using the code inspection features of their IDE), straightforward exceptions and backtraces when something goes wrong, lower barrier to entry to contribute/fix things, ....

An alternative solution for 1. is still using traditional hardcoded methods but using unit tests that compare the openEO API description with the available methods of ImageCollectionClient and fail when something is missing. We are using this approach in the python driver to support dedicated python exceptions for each openEO error code (as defined in https://open-eo.github.io/openeo-api/errors/):

we hardcoded an exception class for each error code in https://github.com/Open-EO/openeo-python-driver/blob/master/openeo_driver/errors.py
the unit tests in https://github.com/Open-EO/openeo-python-driver/blob/master/tests/test_errors.py compare the exception classes with the expected openEO error codes

That being said, there are probably some ways to reduce the necessary boilerplate code and overhead to implement a process as a method in the client.

About 2.: this should be relatively straightforward to implement. It should be optional at the moment however, because probably not all backends properly declare which processes are supported in the capabilities endpoint (the VITO backend doesn't for example)

jdries commented 4 years ago

There exists a way to add custom/unsupported processes: https://open-eo.github.io/openeo-python-client/#openeo.rest.imagecollectionclient.ImageCollectionClient.graph_add_process

Perhaps we need to improve documentation so that people find it more easily?

About dynamically generated processes, I agree with Stefaan. I would however not object to someone showing how this can be done in the python client (as a separate way of building process graphs, separate from the ImageCollection class).

bgoesswe commented 4 years ago

I'll investigate on possible dynamic generation strategies in the "process_generation" branch.

bgoesswe commented 4 years ago

So I worked on that issue now for a while, and haven't found a working suitable solution to dynamically generating processes using the Python client other than doing it myself. Therefore, in the branch "process_generation", I created a Python tool to generate a Python file given a backend URL (e.g. see here for EURAC). The "ProcessParser" can be used eighter in a python client script or as a command-line call with arguments. How to use the generated processes can be seen in this example. The advantage is that you can use the static defined processes of the Python client with the generated processes available from the backend within the same program, so you do not have to choose the strategy of using the Python client. A disadvantage is that you are relying on the documentation provided by the backend. I was also thinking about doing it in a more Object Oriented way by generating a new class that inherits from "ImageCollectionClient" with additional generated methods, but there I ran (atm) into the issue of methods with the same name and I am not sure if this is a good way to go anyways.

soxofaan commented 4 years ago

Interesting work, can you create a pull request? that might help with further fine tuning and discussion

bgoesswe commented 4 years ago

Some points to consider from today's discussion:

Issues static:

How to handle different backend implementations (e.g. some parameters of processes might not be supported on every backend)
In API version 1.0 there are now custom_processes, which can not be defined statically.
Moving error handling to the backend: If something is not supported it will not show the line of code where an error happened, but a backend error message.

Issues dynamic:

No convenience functions possible (e.g. operations)
No auto-completion (if not generated before)
Dependent on backend definition and documentation.

In my opinion, the basic functionality (e.g. load_collection, filters) should be static for convenience reasons. Other things should be dynamically generated (e.g. custom processes). The main issue is to decide where to draw the line on what should be static or not.

soxofaan commented 4 years ago

Inspired by yesterday's discussion I also played a bit with the following idea: add a property .dynamic to ImageCollection objects that delegates all function calls to corresponding dynamically detected process. The full pull request (WIP) is at #118

The basic unit test shows how it is intended to work:

see https://github.com/Open-EO/openeo-python-client/pull/118/files#diff-8759acd033807b1640bc4f0c60fa473b

I first create a dummy backend with a custom process make_larger that takes a raster cube and float as parameters:

    {
        "id": "make_larger",
        "description": "multiply a raster cube with a factor",
        "parameters": [
            {"name": "data", "schema": {"type": "object", "subtype": "raster-cube"}},
            {"name": "factor", "schema": {"type": "float"}},
        ]}

Then I can "call" this process through the dynamic property as follows

    cube = session040.load_collection("SENTINEL2")
    cube = cube.dynamic.make_larger(factor=42)

This will inject a "make_larger" node into the process graph of the resulting cube

Some notes:

the name .dynamic is the best I could come up with for now, if someone has a better idea: please let me know
by using a property .dynamic you can clearly separate "static" predefined methods and dynamically detected processes. Obviously, it allows to have a predefined convenience function hardcoded in the client and a custom process in a backend with the same name
Current implementation only supports processes that have exactly one "raster-cube" parameter (to which the self of the ImageCollection instance will be bound). However, to support all kinds of processes we could also define a comparable .dynamic property on the Connection object

m-mohr commented 4 years ago

@soxofaan

by using a property .dynamic you can clearly separate "static" predefined methods and dynamically detected processes. Obviously, it allows to have a predefined convenience function hardcoded in the client and a custom process in a backend with the same name

In general I like this, but I don't think a user cares about whether something is dynamic or hard-coded. That should be completely hidden at best and simply be cube.make_larger(factor=42).

the name .dynamic is the best I could come up with for now, if someone has a better idea: please let me know

If we need such a "prefix": custom? proprietary?

From the API perspective there are in general two kind of processes:

user-defined (/process_graphs) -> listUserProcesses()
predefined (/processes) -> listProcesses()

If required, you could split predefined into two categories

core (i.e. the one following the process spec)
custom / proprietary (i.e. processes that don't follow the process spec) - custom is less descriptive, proprietary is descriptive but hard to type (IMHO).

Open-EO / openeo-python-client

Dynamic Process function generation #40