run_udf: Unclear why two data types are defined

Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.

https://processes.openeo.org

Apache License 2.0

48 stars 15 forks source link

run_udf: Unclear why two data types are defined #515

Open m-mohr opened 2 months ago

m-mohr commented 2 months ago

Process ID: run_udf

Describe the issue: Run UDF has the following definition for data:

        {
            "name": "data",
            "description": "The data to be passed to the UDF.",
            "schema": [
                {
                    "title": "Array",
                    "type": "array",
                    "minItems": 1,
                    "items": {
                        "description": "Any data type."
                    }
                },
                {
                    "title": "Single Value",
                    "description": "A single value of any data type."
                }
            ]
        }

Why does it explicitly define an array type? The way the second schema is defined, both apply for an array. So it's equivalent to:

        {
            "name": "data",
            "description": "The data to be passed to the UDF.",
            "schema": {
                "title": "Single Value",
                "description": "A single value of any data type."
            }
        }

Can someone remember why this is the way it is? @soxofaan maybe?

Proposed solution: Simplify the schema?

m-mohr commented 2 months ago

Git blame shows that this was changed 2021 (6ec1848b1acf37fdacb8467de86a3e63c1325657) from two schemas (raster-cube and array) to two schemas (array and any). I believe this was done to encourage usage in datacube operations such as reduce_dimension and apply_dimension. It's not made overly clear in the description of the parameter. Either we need to describe the two schemas better or we should remove it and explain better what users are expected to do.

soxofaan commented 2 months ago

changed 2021 (https://github.com/Open-EO/openeo-processes/commit/6ec1848b1acf37fdacb8467de86a3e63c1325657) from two schemas (raster-cube and array) to two schemas (array and any).

you mean from three schemas (raster-cube, array and any) to two schemas (array and any)

regardless, I have no idea why it's like this or if there are important reasons not to simplify as proposed. The only reason I can think of is to make things a bit more self-documenting, but I guess it's then just better to improve descriptions/dcos than keeping confusing schema constructs

m-mohr commented 1 month ago

Yes, I agree... I guess it was meant as a way to somehow recommend that UDFs are used in callbacks. PR: #519