Open clausmichele opened 5 months ago
+1 we will probably start implementation work on this still in 2024 (I hope) For cwl_params, I'm wondering if we can find a solution that makes it look more like how other openEO processes specify parameters? One idea could be that we simply interpret all extra process arguments as cwl parameters.
The other difficult thing is how data goes in and out. STAC is for sure the solution, but it needs constraints to be usable. Also thinking if it is possible to avoid constructions where process graphs have to be very explicit about converting datacube to stac, running AP, and then reading back from STAC, or if we can have (a variant?) of run_ogc_application_package that simply works for rastercube input/output.
That sounds pretty reasonable. The return value should probably be a data cube (or the new stac subtype, see #485).
Here's a reference to an old PR, which had similar aims and has some discussion already: https://github.com/Open-EO/openeo-processes/pull/332
One idea could be that we simply interpret all extra process arguments as cwl parameters.
That's not a thing in openEO, primarily because not all programming language have a construct such as kwargs in Python.
I was just wondering whether CWL could just be another UDF runtime and whether we could use run_udf? @clausmichele
Maybe @jzvolensky can help, he's our OGC AP expert. I guess in this case we can't pass a single code block which contains everything, definition and input parameters to run an AP?
@clausmichele I am not sure how that would work with the ADES. Since the CWL processes are stored in the ADES I suppose they could be read in a UDF and then you provide the input parameters in the UDF and then send the processing request to ADES? Maybe this is something we can look at/think about.
What is ADES in our context here?
I did assume that you'd specify a CWL file and there happened no interaction before to store the CWL.
Sorry, ADES is the Application Deployment and Execution Service from the EOEPCA project. Basically a CWL execution engine which also supports managing CWLs (deploy, undeploy etc.). Our idea is to plug this into OpenEO so that with a process or possibly a UDF? we can then execute Application Packages. In this way we can have a set of predefined processes available to the user, or possibly allow the user to provide their own.
The specification should be independant of the implementation. So ADES might be a data point, but we should probably focus on the underlying specification (i.e. OGC API - Processes - Part 2/3). Plugging that in makes sense, but in the end a CWL could also be just a specific "language" to express UDFs in, similar to Python or R.
I was just wondering whether CWL could just be another UDF runtime and whether we could use run_udf? @clausmichele
Hello, so I looked at the run_udf
process spec, I guess this could work. Just to understand it correctly, you would for example run_udf
and pass the CWL (file, url, whatever), as well as inputs (yaml or json) for the CWL with a runtime set cwl1.2
and then the runtime would do whatever it needs to do in the backend to execute and return result?
Yeah. If we are reusing run_udf instead of a new process, it could look as follows in a process graph:
{
process_id: "run_udf",
arguments: {
udf: "... CWL as YAML or URL or string ...",
runtime: "cwl",
version: "1.2", // could be omitted as it's the default version, see below
context: {
cwl_param1: true,
cwl_param2: 99
}
}
}
While GET /udf_runtimes
lists:
{
title: "EO Application Packages (CWL)",
type: "language",
default: "1.2",
versions: {
"1.2": {
libraries: { ... } // not sure about this entry. I guess it could pre-loaded docker images or so?
}
}
}
It's just an idea that doesn't need an explicit process. If people think it would make sense to have a separate process, we can also discuss that. But right now I don't see an explicit reason why that might be better. Please let me know if you have any reasons in mind.
Somewhat related issue: #515
Also, run_udf is usually meant to be executed in datacube processes such as reduce_dimension. This would not be the case for EO Application packages I guess, which is somewhat against the best practice of UDFs. It's somewhat unclear how a mapping from the EO Application Packages and the openEO data types can be achieved and communicated to users.
Related process: run_udf_externally
Okay, the first part looks really neat with defining the workflow and inputs.
in the second GET /udf_runtimes
Do you mean just to list the available docker images? Unless we extract them from the CWLs, this is not information which we/user needs to define, it is defined in the CWL, and it doesn't really provide any added value to store this, I think.
The last paragraph is interesting. I mean the Application Packages are fully standalone applications right. From this point of view a new process makes sense, because the application and execution of it is outside of your traditional process graph scope. All that we do is bind it together with the rest of openeo processes chain using a process graph (however in theory we don't need to use any other process to use it, so it really can be a standalone process).
I do like the UDFs idea and if the UDF can support this with some minor best practices update or a general UDF use case extension then that is good, I suppose.
Notes from the meeting today:
Input/Output in CWL:
Ways of interacting with CWL in openEO:
Pre-deployment
GET /
to link to ADESProcesses could work as follows:
{ process_id: "run_ogcapi", arguments: { data: ..., id: "my-ap", inputs: { cwl_param1: true, cwl_param2: 99 } } }
or
{ process_id: "run_ogcapi_externally", arguments: { data: ..., url: "https://processes.otherprovider.com", id: "my-ap", -> https://processes.otherprovider.com/proceesses/my-ap inputs: { cwl_param1: true, cwl_param2: 99 } } }
User-provided CWL in a process graph (or via URL/file path)
GET /udf_runtimes
as a language (we should recommend a name, e.g. CWL or EOAP?, tbc with OGC)process for execution: run_udf
{ process_id: "run_udf", arguments: { data: ..., udf: "... CWL as YAML/... or URL or string ...", runtime: "cwl", version: "1.2", context: { cwl_param1: true, cwl_param2: 99 } } }
The open questions to OGC have been posted here: https://github.com/opengeospatial/ogcapi-processes/issues/428
See PR #520 for a proposal, please discuss further issues in the PR.
run_ogc_application_package
Context
For the InterTwin project (and soon others), we would like to run an OGC Application Package inside an openEO process graph. The documentation for OGC Application Package is here: https://docs.ogc.org/bp/20-089r1.html We see it as a process similar to
run_udf
.Summary
Description
Parameters
data
Optional: yes
Description
The data to be passed to the OGC Application Package execution engine. Optional since the input data could be already defined in the CWL file and therefore it wouldn't need any other inputs.
Data Type
Datacube
cwl
Optional: no
Description
Currently it's a YAML file. Either we pass is as pure text/string like for UDFs, or we pass an URL to it and the back-end loads it. The schema could be the same as for the
udf
parameter ofrun_udf
with some changes.Data Type
string
cwl_params
Optional: no
Description
It's either a YAML or JSON file. Again, it could be passed in the same ways described for the previous one.
Data Type
string
Return Value
Description
The result should be made available as a STAC object, so a JSON string. In this way, in the back-end it's possible to continue the process graph using
load_stac
.Data Type
string
Links to additional resources (optional)
Examples
Currently in development: interTwin-eu/HyDroForM: Hydrological Drought Forecasting Model with HydroMT and Wflow (github.com)
OR something like this:
(very experimental, uses sapporo service: sapporo-wes/sapporo-service: A standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification. (github.com) )
I put in cc the people from Eurac working on this @jzvolensky @iacopoff @aljacob
And I am aware VITO is also interested: @jdries @soxofaan EODC @christophreimer