Closed bgoesswe closed 4 years ago
I'm not sure I understand the issue here, because what you describe is already the approach in the python client.
when a process is added to the Imagecollection/datacube it automatically sets the last node as input for the new process
If you use the methods on ImageCollectionClient (e.g. filter_bbox, band, ndvi, ...), you create new python objects each time (process graphs might be partially shared, but that is an implementation detail) and each ImageCollectionClient object has an individual property .node_id
pointing to the "last" node of the graph. However you still can "access" the nodes of previously created objects.
s2 = con.load_collection("S2")
a = s2.filter_bbox(...)
b = a.filter_temporal(...)
c = b.apply_kernel(...)
at this point you still can build different process "paths" from the a
and b
objects, it's not that c
is the only thing that can be built upon.
So yes this is possible, but the issue is that it is not possible to use one node as the input of two nodes in the same process graph, so e.g.:
s2 = con.load_collection("S2")
a = s2.filter_band(...)
b = s2.filter_band(...)
c = s2.normalized_difference(a, b)
Should result in a process graph similiar like:
{ "load_collection1": {},
"filter_bands1": {... "from_node": "load_collection1" },
"filter_bands2": {... "from_node": "load_collection1" },
"normalized_difference1": {... "band1": { "from_node": "filter_bands1"}, "band2": { "from_node": "filter_bands2"}}
...
}
At the moment the EVI in the phenology example is created with a new process graph and a reducer (using the band() method), which at the moment fails on GEE.
Complete NDVI example for GEE, which can not be reconstructed with the Python client atm:
{
"load_collection_IQCXY5622P": {
"process_id": "load_collection",
"arguments": {
"id": "COPERNICUS/S2",
"spatial_extent": {
"west": 11.2792,
"south": 46.4643,
"east": 11.4072,
"north": 46.5182
},
"temporal_extent": [
"2018-06-04",
"2018-06-23"
],
"bands": [
"B4",
"B8"
]
}
},
"filter_bands_TMGHB5086J": {
"process_id": "filter_bands",
"arguments": {
"data": {
"from_node": "load_collection_IQCXY5622P"
},
"bands": [
"B4"
]
}
},
"normalized_difference_SGZTT5944D": {
"process_id": "normalized_difference",
"arguments": {
"band1": {
"from_node": "filter_bands_SNUEL8205U"
},
"band2": {
"from_node": "filter_bands_TMGHB5086J"
}
}
},
"reduce_VVIRY8830F": {
"process_id": "reduce",
"arguments": {
"data": {
"from_node": "normalized_difference_SGZTT5944D"
},
"reducer": {
"callback": {
"max_JPPTE5228K": {
"process_id": "max",
"arguments": {
"data": {
"from_argument": "data"
}
},
"result": true
}
}
},
"dimension": "temporal"
}
},
"apply_DUEGA7716B": {
"process_id": "apply",
"arguments": {
"data": {
"from_node": "reduce_VVIRY8830F"
},
"process": {
"callback": {
"linear_scale_range_NROJC6099D": {
"process_id": "linear_scale_range",
"arguments": {
"x": {
"from_argument": "x"
},
"inputMin": -1,
"inputMax": 1,
"outputMin": 0,
"outputMax": 255
},
"result": true
}
}
}
}
},
"save_result_ELJKH7975C": {
"process_id": "save_result",
"arguments": {
"data": {
"from_node": "apply_DUEGA7716B"
},
"format": "png"
},
"result": true
},
"filter_bands_SNUEL8205U": {
"process_id": "filter_bands",
"arguments": {
"data": {
"from_node": "load_collection_IQCXY5622P"
},
"bands": [
"B8"
]
}
}
}
ok, I understand better now. made a notebook gist showing current behavior of client: https://gist.github.com/soxofaan/91e31e72a4f1446fbf952e8617158867 the load_collection node is indeed needlessly duplicated when combining multiple bands
This related somewhat to #102 , which is also result of the current way the process graph is built and managed in the client. @jdries and I already discussed better approaches and this merging issue should certainly be part of the exercise
closing this ticket (for now) main ticket to address the underlying issue: #117
e.g. see unit test https://github.com/Open-EO/openeo-python-client/blob/dab3000f2d1d8a7b151b6247cd026774a7e7332a/tests/rest/datacube/test_bandmath.py#L133-L143
This came up when I tried to make the examples work again for GEE: Currently, when a process is added to the Imagecollection/datacube it automatically sets the last node as input for the new process, which is kind of intuitive. The problem is that there are process graphs that can not be created by this: e.g. The NDVI example process graph from the GEE web editor:
I know that in this particular example there is the Imagecollection.band method, but there might be use cases where there is no predefined function to solve that. (and with this band method I could not make it run on GEE)
Just an idea:
datacube = con.imagecollection("COPERNICUS/S2") datacube = datacube.filter_bbox(...) datacube_filter = datacube.filter_daterange(...)
datacube_red = datacube_filter.filter_bands("B4") datacube_nir = datacube_filter.filter_bands("B8A") datacube_ndvi = datacube_filter.normalized_difference(datacube_red, datacube_nir)
The last call merges the process graphs of datacube_red and datacube_nir so that there are no duplicate nodes in the process graph of datacube_ndvi. I am also not sure how to make that in a transparent way for the users, so this is open for discussion.