Open-EO / openeo-processes-dask

Python implementations of many OpenEO processes, dask-friendly by default.
Apache License 2.0
19 stars 14 forks source link

Can't add pi in apply #154

Closed clausmichele closed 1 year ago

clausmichele commented 1 year ago

Sample code to reproduce the error:

import numpy as np
from datetime import datetime 
from openeo.local import LocalConnection
from openeo.processes import pi
local_conn = LocalConnection("./")

url = "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
spatial_extent =  {"east": 11.40, "north": 46.52, "south": 46.46, "west": 11.25}
temporal_extent = ["2022-06-01", "2022-06-30"]
bands = ["red", "nir"]
properties = {"eo:cloud_cover": dict(lt=80)}
s2_datacube = local_conn.load_stac(
    url=url,
    spatial_extent=spatial_extent,
    temporal_extent=temporal_extent,
    bands=bands,
    properties=properties,
)

def add_pi(x):
    return x.add(pi)
data = s2_datacube.apply(add_pi)
data.execute()

Error: TypeError: unsupported operand type(s) for +: 'Array' and 'functools.partial'

GeraldIr commented 1 year ago

pi is a function not a standalone value, thus it needs to be called, not referenced.

As opposed to apply which expects a function, add just expects a value.

def add_pi(x):
    return x.add(pi)

try adding parenthesis to call pi like:

def add_pi(x):
    return x.add(pi())

That should fix the problem!

clausmichele commented 1 year ago

Hi @GeraldIr, the process graph generated in both cases is the same and therefore the issue is somewhere else!

GeraldIr commented 1 year ago

So what exactly is the concrete issue, the initial error is a python runtime error which is indeed raised by your provided code.

If I call data.to_json() instead of data.execute() I can produce a process graph that is indeed the same for both cases. Are you getting that error somewhere else when using the process graph?

Edit: I see now where you are getting the error, my apologies! I'll get back to you when I have resolved the problem!

GeraldIr commented 1 year ago

Okay, i think this is an issue with the client translating the code to a process graph, here is the process_graph that gets generated by calling .to_json()

{
  "process_graph": {
    "loadstac1": {
      "process_id": "load_stac",
      "arguments": {
        "bands": [
          "red",
          "nir"
        ],
        "properties": {
          "eo:cloud_cover": {
            "lt": 80
          }
        },
        "spatial_extent": {
          "east": 11.4,
          "north": 46.52,
          "south": 46.46,
          "west": 11.25
        },
        "temporal_extent": [
          "2022-06-01",
          "2022-06-30"
        ],
        "url": "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
      }
    },
    "apply1": {
      "process_id": "apply",
      "arguments": {
        "data": {
          "from_node": "loadstac1"
        },
        "process": {
          "process_graph": {
            "add1": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "y": {
                  "process_graph": {
                    "pi1": {
                      "process_id": "pi",
                      "arguments": {},
                      "result": true
                    }
                  }
                }
              },
              "result": true
            }
          }
        }
      },
      "result": true
    }
  }
}

as you can see, the y-parameter of the add inside of the apply is a standalone process graph, add shouldn't have a process graph as an input, this can only be a number or null (either in the form of a ResultReference (from_node), a ParameterReference (from_parameter) or as an actual number value).

If we recreate the intended behavior in the openeo-webeditor we get the following process graph:

{
  "process_graph": {
    "1": {
      "process_id": "apply",
      "arguments": {
        "data": {
          "from_node": "loadstac1"
        },
        "process": {
          "process_graph": {
            "add1": {
              "process_id": "add",
              "arguments": {
                "x": {
                  "from_parameter": "x"
                },
                "y": {
                  "from_node": "pi2"
                }
              },
              "result": true
            },
            "pi2": {
              "process_id": "pi",
              "arguments": {}
            }
          }
        }
      },
      "result": true
    },
    "loadstac1": {
      "process_id": "load_stac",
      "arguments": {
        "bands": [
          "red",
          "nir"
        ],
        "properties": {
          "eo:cloud_cover": {
            "lt": 80
          }
        },
        "spatial_extent": {
          "east": 11.4,
          "north": 46.52,
          "south": 46.46,
          "west": 11.25
        },
        "temporal_extent": [
          "2022-06-01",
          "2022-06-30"
        ],
        "url": "https://earth-search.aws.element84.com/v1/collections/sentinel-2-l2a"
      }
    }
  },
  "parameters": []
}

As you can see, the pi2 node is on the same level of the hierarchy as the add in this case and instead of the process_graph for pi being the y-parameter itself there is a from_node reference to it.

clausmichele commented 1 year ago

Thanks @GeraldIr! @soxofaan is the python client generating an invalid graph in this case? Or am I trying to add pi() in the wrong way?

soxofaan commented 1 year ago

cube.apply(lambda x: x.add(pi)) without the parenthesis, indeed handles pi as a callable process, which the client translates to a sub-process graph as mentioned above. It's like when you would do cube.apply_dimension(lambda x: x.array_apply(sum)): the sum there is will be handled as a sub-process graph.

cube.apply(lambda x: x.add(pi())) generates the correct process graph indeed:

...
    "apply1": {
      "process_id": "apply",
      "arguments": {
        "data": {"from_node": "loadcollection1"},
        "process": {
          "process_graph": {
            "pi1": {"process_id": "pi", "arguments": {}},
            "add1": {
              "process_id": "add",
              "arguments": {
                "x": {"from_parameter": "x"},
                "y": {"from_node": "pi1"}
              },
              "result": true
            }
          }
        }
...
clausmichele commented 1 year ago

Thanks @soxofaan, this indeed solves the issue!