Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
48 stars 15 forks source link

apply_neighborhood: Data type (datacube/array) in the callback #387

Closed m-mohr closed 1 year ago

m-mohr commented 1 year ago

Process ID: apply_neighborhood

Describe the issue: apply_neighborhood is incompatible with the array processes that we typically use in callbacks. So you can currently only use datacube-based processes such as run_udf, apply and apply_dimension in it, but not for example mean, first or array_element.

Proposed solution: Potential solutions for it:

  1. an explicit datacube to 1D array function (e.g. datacube_to_array(data: datacube) -> labeled-array)
  2. Add labeled-array as a data type to the callback schema (see process description below)
  3. use apply_dimension on the remaining dimension (a bit cumbersome)

It doesn't really work if backends implicitly flatten a datacube into a 1D array when needed, because that doesn't fit into the strictly types process descriptions. User would not expect that. Instead, solution 2 would be required.

I have no strong preference yet.

Note: apply_neighborhood is unfortunately not in the draft state and as such a breaking change requires to wait for v2 (which is planned anyway for vector data cubes).

Additional context: Origin: https://discuss.eodc.eu/t/moving-average-with-low-pass-filter/428/12

Proposal for 2 (only the process parameter):

{
  "name": "process",
  "description": "Process to be applied on all neighborhoods.",
  "schema": {
    "type": "object",
    "subtype": "process-graph",
    "parameters": [
      {
        "name": "data",
        "description": "The input data, which is a subset of the data cube as specified in `size` and `overlap`. If the given size and overlap result in a one-dimensional data cube it is converted to an array.",
        "schema": [
          {
            "title": "Multi-dimensional data",
            "type": "object",
            "subtype": "raster-cube"
          },
          {
            "title": "One-dimensional data",
            "type": "array",
            "subtype": "labeled-array"
          }
        ]
      },
      {
        "name": "context",
        "description": "Additional data passed by the user.",
        "schema": {
          "description": "Any data type."
        },
        "optional": true,
        "default": null
      }
    ],
    "returns": {
      "description": "An array or data cube with the newly computed values. The data type and dimensionality must correspond to the input data. For data cubes this means it must have the same dimensions and the dimension properties (name, type, labels, reference system and resolution) must remain unchanged. Otherwise, a `DataCubePropertiesImmutable` exception will be thrown.",
      "schema": [
        {
          "title": "Multi-dimensional data",
          "type": "object",
          "subtype": "raster-cube"
        },
        {
          "title": "One-dimensional data",
          "type": "array",
          "subtype": "labeled-array"
        }
      ]
    }
  }
}
m-mohr commented 1 year ago

I did not hear a clear tendency in the call, but for Huriels use case we may start with UDFs for now...