bgoesswe commented 4 years ago

This came up when I tried to make the examples work again for GEE: Currently, when a process is added to the Imagecollection/datacube it automatically sets the last node as input for the new process, which is kind of intuitive. The problem is that there are process graphs that can not be created by this: e.g. The NDVI example process graph from the GEE web editor:

load_collection
filter red band using input node load_collection
filter nir band using input node load_collection
apply NDVI process using the node of filter_red and filter_nir

I know that in this particular example there is the Imagecollection.band method, but there might be use cases where there is no predefined function to solve that. (and with this band method I could not make it run on GEE)

Just an idea:

datacube = con.imagecollection("COPERNICUS/S2") datacube = datacube.filter_bbox(...) datacube_filter = datacube.filter_daterange(...)

datacube_red = datacube_filter.filter_bands("B4") datacube_nir = datacube_filter.filter_bands("B8A") datacube_ndvi = datacube_filter.normalized_difference(datacube_red, datacube_nir)

The last call merges the process graphs of datacube_red and datacube_nir so that there are no duplicate nodes in the process graph of datacube_ndvi. I am also not sure how to make that in a transparent way for the users, so this is open for discussion.

soxofaan commented 4 years ago

I'm not sure I understand the issue here, because what you describe is already the approach in the python client.

when a process is added to the Imagecollection/datacube it automatically sets the last node as input for the new process

If you use the methods on ImageCollectionClient (e.g. filter_bbox, band, ndvi, ...), you create new python objects each time (process graphs might be partially shared, but that is an implementation detail) and each ImageCollectionClient object has an individual property .node_id pointing to the "last" node of the graph. However you still can "access" the nodes of previously created objects.

s2 = con.load_collection("S2")
a = s2.filter_bbox(...)
b = a.filter_temporal(...)
c =  b.apply_kernel(...)

at this point you still can build different process "paths" from the a and b objects, it's not that c is the only thing that can be built upon.

bgoesswe commented 4 years ago

So yes this is possible, but the issue is that it is not possible to use one node as the input of two nodes in the same process graph, so e.g.:

s2 = con.load_collection("S2")
a = s2.filter_band(...)
b = s2.filter_band(...)
c =  s2.normalized_difference(a, b)

Should result in a process graph similiar like:

{ "load_collection1": {}, 
  "filter_bands1": {... "from_node": "load_collection1" },
  "filter_bands2": {... "from_node": "load_collection1" },
  "normalized_difference1": {... "band1": { "from_node": "filter_bands1"}, "band2": { "from_node": "filter_bands2"}}
   ...
}

At the moment the EVI in the phenology example is created with a new process graph and a reducer (using the band() method), which at the moment fails on GEE.

Complete NDVI example for GEE, which can not be reconstructed with the Python client atm:

{
  "load_collection_IQCXY5622P": {
    "process_id": "load_collection",
    "arguments": {
      "id": "COPERNICUS/S2",
      "spatial_extent": {
        "west": 11.2792,
        "south": 46.4643,
        "east": 11.4072,
        "north": 46.5182
      },
      "temporal_extent": [
        "2018-06-04",
        "2018-06-23"
      ],
      "bands": [
        "B4",
        "B8"
      ]
    }
  },
  "filter_bands_TMGHB5086J": {
    "process_id": "filter_bands",
    "arguments": {
      "data": {
        "from_node": "load_collection_IQCXY5622P"
      },
      "bands": [
        "B4"
      ]
    }
  },
  "normalized_difference_SGZTT5944D": {
    "process_id": "normalized_difference",
    "arguments": {
      "band1": {
        "from_node": "filter_bands_SNUEL8205U"
      },
      "band2": {
        "from_node": "filter_bands_TMGHB5086J"
      }
    }
  },
  "reduce_VVIRY8830F": {
    "process_id": "reduce",
    "arguments": {
      "data": {
        "from_node": "normalized_difference_SGZTT5944D"
      },
      "reducer": {
        "callback": {
          "max_JPPTE5228K": {
            "process_id": "max",
            "arguments": {
              "data": {
                "from_argument": "data"
              }
            },
            "result": true
          }
        }
      },
      "dimension": "temporal"
    }
  },
  "apply_DUEGA7716B": {
    "process_id": "apply",
    "arguments": {
      "data": {
        "from_node": "reduce_VVIRY8830F"
      },
      "process": {
        "callback": {
          "linear_scale_range_NROJC6099D": {
            "process_id": "linear_scale_range",
            "arguments": {
              "x": {
                "from_argument": "x"
              },
              "inputMin": -1,
              "inputMax": 1,
              "outputMin": 0,
              "outputMax": 255
            },
            "result": true
          }
        }
      }
    }
  },
  "save_result_ELJKH7975C": {
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "apply_DUEGA7716B"
      },
      "format": "png"
    },
    "result": true
  },
  "filter_bands_SNUEL8205U": {
    "process_id": "filter_bands",
    "arguments": {
      "data": {
        "from_node": "load_collection_IQCXY5622P"
      },
      "bands": [
        "B8"
      ]
    }
  }
}

soxofaan commented 4 years ago

ok, I understand better now. made a notebook gist showing current behavior of client: https://gist.github.com/soxofaan/91e31e72a4f1446fbf952e8617158867 the load_collection node is indeed needlessly duplicated when combining multiple bands

soxofaan commented 4 years ago

This related somewhat to #102 , which is also result of the current way the process graph is built and managed in the client. @jdries and I already discussed better approaches and this merging issue should certainly be part of the exercise

soxofaan commented 4 years ago

closing this ticket (for now) main ticket to address the underlying issue: #117

soxofaan commented 4 years ago

126 was merged, issue is fixed in the new 1.0 API style DataCube

e.g. see unit test https://github.com/Open-EO/openeo-python-client/blob/dab3000f2d1d8a7b151b6247cd026774a7e7332a/tests/rest/datacube/test_bandmath.py#L133-L143

Open-EO / openeo-python-client

Defining predecessor node of a process #107

126 was merged, issue is fixed in the new 1.0 API style DataCube