Closed jdries closed 4 months ago
What priority / SRR assignment does this have? @jdries
I'm going to try and advance this further. However I'm not sure I exactly understand the medoid example.
I made an example calculation for a single pixel for understanding reduce_dimension(reducer = "max", dim = "t")
:
Normally this would result into the maximum per band | band1 | band2 | band3 | |
---|---|---|---|---|
time1 | 5 | 9 | 32 | |
time2 | 8 | 0 | 6 | |
time3 | 16 | 3 | 18 |
But what we want is the maximum of combined band values, i.e. the maximum that existed at the same point in time:
band1 | band2 | band3 | |
---|---|---|---|
time1 | 5 | 9 | 32 |
time2 | 8 | 0 | 6 |
time3 | 16 | 3 | 18 |
Correct me if I am getting this wrong, please.
Now what I don't understand is how the timestep to use is determined in a percentile (e.g. median) computation. A median as it exists now would yield this result:
band1 | band2 | band3 | |
---|---|---|---|
time1 | 5 | 9 | 32 |
time2 | 8 | 0 | 6 |
time3 | 16 | 3 | 18 |
What is the idea of a medoid here?
The given feedback says
here we look at the distribution of a rank band (e.g. SWIR2) and then identify the percentile of interest (e.g. p90), then the select the respective acquisition and use all band values of that acquisition to fill the percentile (ranked) composite
but how exactly is this acquisition selected?
@jdries or could you link the author of the feedback you cited?
I think there's two different cases, your example, (and also the medoid function) looks at all band values to select a single observation. The other case, referred to as 'rank band composite', composites all other bands based on the value of a rank band. So if band 1 is my rank band, and use the 'max' function to compsite, then we get: | band1 | band2 | band3 | |
---|---|---|---|---|
time1 | 5 | 9 | 32 | |
time2 | 8 | 0 | 6 | |
time3 | 16 | 3 | 18 |
If band 3 is the rank band, a max composite gives:
band1 | band2 | band3 | |
---|---|---|---|
time1 | 5 | 9 | 32 |
time2 | 8 | 0 | 6 |
time3 | 16 | 3 | 18 |
A 'max-NDVI' composite is one well-known example of such an operation.
As a start, @jonathom and I tried to come up with a solution for a criteria-based observed composite process that re-uses existing processes:
{
"process_graph": {
"2": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_parameter": "data"
},
"reducer": {
"from_parameter": "criteria"
},
"dimension": "bands"
}
},
"3": {
"process_id": "apply_dimension",
"arguments": {
"data": {
"from_node": "2"
},
"process": {
"process_graph": {
"2": {
"process_id": "array_find",
"arguments": {
"value": {
"from_node": "3"
},
"data": {
"from_parameter": "data"
}
}
},
"3": {
"process_id": "max",
"arguments": {
"data": {
"from_parameter": "data"
}
}
},
"6": {
"process_id": "array_apply",
"arguments": {
"data": {
"from_parameter": "data"
},
"process": {
"process_graph": {
"1": {
"process_id": "constant",
"arguments": {
"x": null
},
"result": true
}
}
}
}
},
"7": {
"process_id": "array_modify",
"arguments": {
"data": {
"from_node": "6"
},
"values": {
"from_node": "3"
},
"index": {
"from_node": "2"
}
},
"description": "Create an array with null values",
"result": true
}
}
},
"dimension": "t"
}
},
"4": {
"process_id": "merge_cubes",
"arguments": {
"cube2": {
"from_node": "3"
},
"cube1": {
"from_parameter": "data"
},
"overlap_resolver": {
"process_graph": {
"2": {
"process_id": "if",
"arguments": {
"value": {
"from_node": "4"
},
"accept": {
"from_parameter": "x"
}
},
"result": true
},
"4": {
"process_id": "neq",
"arguments": {
"x": {
"from_parameter": "y"
},
"y": null
}
}
}
}
}
},
"5": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_node": "4"
},
"reducer": {
"process_graph": {
"1": {
"process_id": "first",
"arguments": {
"data": {
"from_parameter": "data"
}
},
"result": true
}
}
},
"dimension": "t"
},
"result": true
}
},
"parameters": [
{
"schema": {
"type": "object",
"subtype": "raster-cube",
"title": "Raster data cube",
"description": "A raster data cube, an image collection stored at the back-end. Different back-ends have different internal representations for this data structure."
},
"name": "data"
},
{
"name": "criteria",
"schema": {
"type": "object",
"subtype": "process-graph",
"title": "User-defined process",
"description": "An process graph that is passed as an argument and is expected to be executed by the process. Parameters passed to the process graph are specified in the `parameters` property of the corresponding schema."
},
"description": "A reducer to select a timestamp, e.g. ``sum`` or ``mean()``."
}
],
"id": "composite_observed"
}
For the medoid-based composite, we'd probably need a separate process that works on data cubes.
.. and yet another process for the rank-based composite. We wondered which of these processes would be the most important to you @jdries?
I made up some short use case descriptions that I hope can be helpful here:
pixel integrity composite
I want to create a cloud-free mosaic of SAR data that only contains observed pixel values. The mosaic should be created by selecting the minimum pixel value. Only one point in time shall be selected to fill the different band values (VV, VH) of a single pixel. To achieve this, the total minimum (sum) of combined band values per timestep should be calculated for the overall minimum calculation.
An optical satellite time series shall be comprised to a mosaic by selecting the maximum value per pixel. Only one point in time shall be selected to fill the different band values (B03, B04, B05 ..) of a single pixel. To achieve this, the median of combined band values (maybe after normalizaton) should be calculated to then select the maximum timestep overall.
I want to create an optical mosaic of an area of interest. The mosaic should be calculated using the median of the timeseries and only contain a) observed values (no calculated mean between two values) and b) observed combinations of band values (all band values from the same point in time). (This aims for describing the medoid)
rank band composite
I want to create a mosaic of a collection (containing a band called 'ndvi') where the chosen pixels correspond to the point in time of maximum NDVI. Other then time being reduced there are no other changes.
I want to create a mosaic of a collection (containing multiple sensors (as dimension), all with ndvi bands) where the chosen pixels correspond to the point in time of maximum combined NDVI. The combined NDVI shall be computed via a median computation.
Thanks, @jonathom. So the next step would be to move these into process graphs and figure out what is missing and whether we can provide UDPs.
The graph above satisfies UC 1.1 and 1.2 (except normalization) (Will look into medoid (1.3) separately). I would change the id
to "observed_pixel_composite"
and add a description
like "Create a composite over time that contains only observed combinations of band values. A single point in time is selected per pixel to fill in all band values."
.
Here's a graph for the rank band composite, the previous graph only needed minor changes. With this, we also have a graph for 2.1. I believe that 2.2 is not needed yet, and I would like to know if this is actually needed IRL.
{
"process_graph": {
"3": {
"process_id": "apply_dimension",
"arguments": {
"data": {
"from_node": "8"
},
"process": {
"process_graph": {
"2": {
"process_id": "array_find",
"arguments": {
"value": {
"from_node": "3"
},
"data": {
"from_parameter": "data"
}
}
},
"3": {
"process_id": "max",
"arguments": {
"data": {
"from_parameter": "data"
}
}
},
"6": {
"process_id": "array_apply",
"arguments": {
"data": {
"from_parameter": "data"
},
"process": {
"process_graph": {
"1": {
"process_id": "constant",
"arguments": {
"x": null
},
"result": true
}
}
}
}
},
"7": {
"process_id": "array_modify",
"arguments": {
"data": {
"from_node": "6"
},
"values": {
"from_node": "3"
},
"index": {
"from_node": "2"
}
},
"description": "Create an array with null values",
"result": true
}
}
},
"dimension": "t"
}
},
"4": {
"process_id": "merge_cubes",
"arguments": {
"cube2": {
"from_node": "3"
},
"cube1": {
"from_parameter": "data"
},
"overlap_resolver": {
"process_graph": {
"2": {
"process_id": "if",
"arguments": {
"value": {
"from_node": "4"
},
"accept": {
"from_parameter": "x"
}
},
"result": true
},
"4": {
"process_id": "neq",
"arguments": {
"x": {
"from_parameter": "y"
},
"y": null
}
}
}
}
}
},
"5": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_node": "4"
},
"reducer": {
"process_graph": {
"1": {
"process_id": "first",
"arguments": {
"data": {
"from_parameter": "data"
}
},
"result": true
}
}
},
"dimension": "t"
},
"result": true
},
"6": {
"process_id": "filter_bands",
"arguments": {
"data": {
"from_parameter": "data"
},
"bands": {
"from_parameter": "rank_band_name"
},
"wavelengths": []
}
},
"8": {
"process_id": "drop_dimension",
"arguments": {
"data": {
"from_node": "6"
},
"name": "bands"
}
}
},
"parameters": [
{
"schema": {
"type": "object",
"subtype": "raster-cube",
"title": "Raster data cube",
"description": "A raster data cube, an image collection stored at the back-end. Different back-ends have different internal representations for this data structure."
},
"name": "data"
},
{
"schema": {
"type": "string"
},
"name": "rank_band_name",
"description": "Name of the rank band."
},
{
"schema": {
"type": "object",
"subtype": "process-graph",
"title": "User-defined process",
"description": "An process graph that is passed as an argument and is expected to be executed by the process. Parameters passed to the process graph are specified in the `parameters` property of the corresponding schema.",
"required": [
"process_graph"
],
"properties": {
"process_graph": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": [
"process_id",
"arguments"
],
"properties": {
"process_id": {
"type": "string"
},
"arguments": {}
}
}
}
}
},
"name": "timestep_selector",
"description": "A process to select a timestep (e.g. `min`, `max`)."
}
],
"id": "rank_band_composite",
"description": "Create a composite according to a rank band and a process, e.g. NDVI and maximum."
}
With the previously mentioned changes, the process graph for an observed pixel composite is this:
{
"process_graph": {
"2": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_parameter": "data"
},
"reducer": {
"from_parameter": "criteria"
},
"dimension": "bands"
},
"position": [
0,
0
]
},
"3": {
"process_id": "apply_dimension",
"arguments": {
"data": {
"from_node": "2"
},
"process": {
"process_graph": {
"2": {
"process_id": "array_find",
"arguments": {
"value": {
"from_node": "3"
},
"data": {
"from_parameter": "data"
}
}
},
"3": {
"process_id": "max",
"arguments": {
"data": {
"from_parameter": "data"
}
}
},
"6": {
"process_id": "array_apply",
"arguments": {
"data": {
"from_parameter": "data"
},
"process": {
"process_graph": {
"1": {
"process_id": "constant",
"arguments": {
"x": null
},
"result": true
}
}
}
}
},
"7": {
"process_id": "array_modify",
"arguments": {
"data": {
"from_node": "6"
},
"values": {
"from_node": "3"
},
"index": {
"from_node": "2"
}
},
"description": "Create an array with null values",
"result": true
}
}
},
"dimension": "t",
"context": {
"from_parameter": "timestep_selector"
}
},
"position": [
240,
0
]
},
"4": {
"process_id": "merge_cubes",
"arguments": {
"cube2": {
"from_node": "3"
},
"cube1": {
"from_parameter": "data"
},
"overlap_resolver": {
"process_graph": {
"2": {
"process_id": "if",
"arguments": {
"value": {
"from_node": "4"
},
"accept": {
"from_parameter": "x"
}
},
"result": true
},
"4": {
"process_id": "neq",
"arguments": {
"x": {
"from_parameter": "y"
},
"y": null
}
}
}
}
},
"position": [
480,
0
]
},
"5": {
"process_id": "reduce_dimension",
"arguments": {
"data": {
"from_node": "4"
},
"reducer": {
"process_graph": {
"1": {
"process_id": "first",
"arguments": {
"data": {
"from_parameter": "data"
}
},
"result": true
}
}
},
"dimension": "t"
},
"result": true,
"position": [
720,
0
]
}
},
"parameters": [
{
"schema": {
"type": "object",
"subtype": "raster-cube",
"title": "Raster data cube",
"description": "A raster data cube, an image collection stored at the back-end. Different back-ends have different internal representations for this data structure."
},
"name": "data"
},
{
"name": "criteria",
"description": "A reducer to summarize the bands dimension, e.g. ``sum`` or ``mean()``.",
"schema": {
"type": "object",
"subtype": "process-graph",
"title": "User-defined process",
"description": "An process graph that is passed as an argument and is expected to be executed by the process. Parameters passed to the process graph are specified in the `parameters` property of the corresponding schema."
}
},
{
"schema": {
"type": "object",
"subtype": "process-graph",
"title": "User-defined process",
"description": "An process graph that is passed as an argument and is expected to be executed by the process. Parameters passed to the process graph are specified in the `parameters` property of the corresponding schema.",
"required": [
"process_graph"
],
"properties": {
"process_graph": {
"type": "object",
"additionalProperties": {
"type": "object",
"required": [
"process_id",
"arguments"
],
"properties": {
"process_id": {
"type": "string"
},
"arguments": {}
}
}
}
}
},
"name": "timestep_selector",
"description": "A process to select a timestep, e.g. `max`, `median`. Can't be a process that returns unobserved values, e.g. `mean`."
}
],
"id": "observed_pixel_composite",
"description": "Create a composite over time that contains only observed combinations of band values. A single point in time is selected per pixel to fill in all band values."
}
A remaining problem is the inability to execute a process graph inside a process graph.
I've been looking at the process graphs a bit, and used the editor to get something more visual.
I don't entirely understand every aspect yet, but looks promising. Having the same in python code would also be nice. Relying on merge_cubes for data that comes from the same source can always be a performance issue, but perhaps we could try to measure that. In any case, we would need helper libraries to construct graphs like this, as they don't look trivial to come up with.
Nice work!
Having the same in python code would also be nice.
I threw something together. It throws a TypeError (TypeError: Object of type 'function' is not JSON serializable
) that I can't figure out, but it should help understand whats going on at the observed_pixel_composite.
from openeo.processes import array_find, array_modify, max, array_apply, constant, if_, neq, nan, first
# the reducer here is the parameter "summarize", used to summarize the bands dimension in a certain way, e.g. "max", "median", "mean"
cube_reduced_bands = cube.reduce_dimension(reducer = "sum", dimension = "bands")
def write_nan(x):
return constant(nan)
def select_timesteps(data):
selected = max(data) # This should be the parameter "timestep_selector", could be "max", "min", "median", cannot be "mean"
selected_index = array_find(data, value = selected)
empty_array = array_apply(data, process = write_nan) # create an array full of nans..
# .. and write a value at the index where the "timestep_selector" identified it, e.g. the max
selected_timesteps = array_modify(data = empty_array, values = selected, index = selected_index)
return selected_timesteps
# create a cube that has nans at all timesteps except the ones selected by the "timestep_selector" process
cube_selected = cube_reduced_bands.apply_dimension(process = select_timesteps, dimension = "t")
def write_original_data(x, y):
# if cube_s2_selected is != nan, then the original values should be written and null otherwise
return if_(neq(y, nan), x)
# merge cubes and replace the placeholders with the actual data for the selected dates
cube_selected_original = cube.merge_cubes(cube_selected, overlap_resolver = write_original_data)
# reduce the time dimension full of nans by selection the only value present in that dimension
cube_composite = cube_selected_original.reduce_dimension(reducer = first, dimension = "t")
res = cube_composite.save_result(format = "GTiff")
When I tried to assign the functions to parameters and then call them,
summarize_bands = sum
timestep_selector = max
another TypeError occurred (TypeError: 'ProcessBuilder' object is not iterable
), so I added descriptions where parameters should be.
My (working) proposal for this one is now available, basically using the ideas introduced here: https://github.com/Open-EO/openeo-community-examples/blob/rankComposite/python/RankComposites/rank_composites.ipynb
As this is working solely on existing processes, can we close this issue then? :-)
yes, closing!
Currently, in openEO, when creating composites, we apply a reducer function over the timeseries for a band. This can result in unobserved values in the sense that a particular combination of band values may not occur in the original dataset, because bands are composited independently. reduce_dimension(reducer='max', dimension='t')
A variation on this topic is computing the 'medoid' over all bands. It is more complex in the sense that you need all bands available, and the medoid process would take a 2D (or higher dimensional?) matrix as input.
GEE seems to have explicit support for this: https://developers.google.com/earth-engine/apidocs/ee-imagecollection-qualitymosaic https://developers.google.com/earth-engine/apidocs/ee-algorithms-landsat-simplecomposite?hl=en
Original feedback: