Math Operation on multi-temporal and mono-temporal datacube

przell commented 4 years ago

Today @flahn and I went through the eurac snow monitoring use case. We encountered a situation where every time step of a multi-temporal datacube [x, y, time] is to be subtracted from a mono-temporal data cube [x, y]. We couldn't figure out how to achieve this elegantly. What would be the way to go to achieve this?

przell commented 4 years ago

Here are some first thoughts provided by @flahn: idea: apply_dimension over space (2 dimensions)? possible? data_cube to multidimensional array? model a callback for apply dimension: from “data” (2 dimensional array or maybe 1 dimensional), some as_array function for monotemporal datacube -> same lengths of input arrays -> then call subtract on arrays

jdries commented 4 years ago

Could also be seen as an extension of merge_cubes, with subtract as overlaps_resolver, and additionally defining how merging is done when dimensions do not match. (The cube without time dimension would basically take over the time dimension of the other one.)

przell commented 4 years ago

Addendum: In this specific case all dimension but time are identical.

jdries commented 4 years ago

Telco conclusion: indeed use merge cubes, clients should try to hide complexity. (For instance by allowing subtracting two different data cubes, and converting that into merge cubes.) Documentation will have to be extended to specify the case where the dimensions of the two cubes do not entirely overlap. Something like: cubes are 'joined' on overlapping dimensions, dimension of the output would be the union of input dimensions?

przell commented 4 years ago

Thanks for the clarification @jdries. Concerning the documentation; I would add what "joined" means, maybe so:

cubes are 'joined' using the function defined in the overlaps_resolver on overlapping dimensions, dimension of the output would be the union of input dimensions.

Some more thoughts on this function:

Should there be an option to only keep the (spatial) intersection of the cubes?
How are differing spatial resolutions handled. Is there one master and one slave cube for spatial resampling? What happens to the resolution in the non-overlapping regions?
How are differing temporal resolutions handled? How will the dates be matched?
How are cubes handled where dimensionality is different (3d-cube vs 4d-cube)?

Maybe this is irrelevant due to the phrase in the documentation "The data cubes have to be compatible.". Does it mean the cubes have to be made "comparable" concerning dimensionality and resolutions before they are to be used in merge_cubes. Sorry, it is not quite clear to me what this means.

lforesta commented 4 years ago

I agree that we need more documentation for this process imo I don't think the two cubes must have the same dimensions; I'm assuming that joining a 3d and 4d cube should result in a 4d cube (assuming the dimensions of the 4d cube include the ones of the 3d cube).

Should there be an option to only keep the (spatial) intersection of the cubes?

hmn, we can use filter_bbox on the two cubes before calling merge_cubes to make sure they cover the same area, no?

How are differing spatial resolutions handled. Is there one master and one slave cube for spatial resampling? What happens to the resolution in the non-overlapping regions?

Must a datacube have a specific spatial resolution (and CRS, see https://github.com/Open-EO/openeo-processes/issues/98#issuecomment-564985331)? @m-mohr is this defined anywhere? I would leave this flexible. e.g. at the moment when we create a spatio-temporal cube from this collection (https://openeo.eodc.eu/collections/s2a_prd_msil1c), it will have different spatial resolutions (it contains all bands) and possibly multiple CRSs depending on the query's bbox. The user must apply the process resample_spatial before doing band math between bands with different spatial resolution.

How are differing temporal resolutions handled? How will the dates be matched?

if dates don't match (i.e. don't overlap), then the new cube has data on both dates (from both cubes). Same applies for any dimension which is not uniform (and time is one of them usually). If they match, the overlap_resolver should indicate what to do with pixels without a univocal value.

m-mohr commented 4 years ago

It seems the process will be made much more general. We need to do this carefully, see all the questions arising here already! Would you be up for a PR, @jdries?

How are differing spatial resolutions handled. Is there one master and one slave cube for spatial resampling? What happens to the resolution in the non-overlapping regions?

Currently not allowed as they would not be "compatible". We would need to be define this now.

Must a datacube have a specific spatial resolution (and CRS, see #98 (comment))? @m-mohr is this defined anywhere?

I thought it would be better defined in the glossary, but it's not. How we envisioned data cubes originally was that they have a single CRS per dimension (assuming CRS for x and y are the same, of course) and (I think) resolutions can be different per band. It's implicitly defined by the metadata though. We'll need to clarify.

I would leave this flexible

Not sure whether this is very intuitive for a user. I would find it more intuitive to have #102 available.

jdries commented 4 years ago

I believe merging indeed requires doing a resample_cube_spatial in case spatial dimension do not match. Google earth engine does this automatically (I think), would be interesting to see if they have a clear definition. I'm willing to do a pull requests, but probably won't happen this year.

lforesta commented 4 years ago

Currently on our back-end, when loading bands from the S2A L1C collection, no resampling is done on any band. This is because the "datacube" is a very loosely one at this point. We can change this and enforce that the user resamples some bands, but then there would be counter-intuitive case in which a user loads two different bands from the same collection, but with two separate calls, and uses the resample_spatial process on one of them (and then merge_cubes):

{
  "B5": {
    "process_id": "load_collection",
    "description": "Loading the data; The order of the specified bands is important for the following reduce operation.",
    "arguments": {
      "id": "s2a_prd_msil1c",
      "spatial_extent": {
        "west": 16.1,
        "east": 16.6,
        "north": 48.6,
        "south": 47.2
      },
      "temporal_extent": ["2018-01-01", "2018-02-01"],
      "bands": ["5"]
    }
  },
  "B4": {
    "process_id": "load_collection",
    "description": "Loading the data; The order of the specified bands is important for the following reduce operation.",
    "arguments": {
      "id": "s2a_prd_msil1c",
      "spatial_extent": {
        "west": 16.1,
        "east": 16.6,
        "north": 48.6,
        "south": 47.2
      },
      "temporal_extent": ["2018-01-01", "2018-02-01"],
      "bands": ["4"]
    }
  },
  "B4_resampled": {
    "process_id": "resample_spatial",
    "arguments": {
      "from_data": {"from_node": "B4"},
      "resolution": [20, 20]
    }
  }
}

@m-mohr how does it work on the WWU-GEE back-end when suing this collection: https://earthengine.openeo.org/v0.4/collections/COPERNICUS/S2? Does the back-end return an error if the user tries to load e.g. band 4 and 5 in the same datacube?

m-mohr commented 4 years ago

@lforesta No, it doesn't throw an error and I don't know what happens internally in GEE. The GEE driver is probably not following the spec very closely, but I don't think I can influence the behavior.

m-mohr commented 4 years ago

@jdries Will you be able to do the PR before the last processes telco? Otherwise it will be hard to get this into the release although its critical for a use-case...

jdries commented 4 years ago

Work on pull request has started.

m-mohr commented 4 years ago

@jdries Great! Make sure to work on the latest draft. From what I see in https://github.com/Open-EO/openeo-processes/commit/9b15ea688a7596d082701fe4d2ba19bfccb89789 you are working on an outdated version...

The description at the moment is a bit longer already:

The data cubes have to be compatible. A merge operation without overlap should be reversible with (a set of) filter operations for each of the two cubes. The process doesn't add dimensions.

This means that the data cubes must have the same dimensions. Each dimension must be available in both data cubes and have the same name, type, reference system and resolution. One of the dimensions can have different labels, for all other dimensions the labels must be equal. If data overlaps, the parameter overlap_resolver must be specified to resolve the overlap.

Examples for merging two data cubes:

Data cubes with the dimensions x, y, t and bands have the same dimension labels in x,y and t, but the labels for the dimension bands are B1 and B2 for the first cube and B3 and B4. An overlap resolver is not needed. The merged data cube has the dimensions x, y, t and bands and the dimension bands has four dimension labels: B1, B2, B3, B4.
Data cubes with the dimensions x, y, t and bands have the same dimension labels in x,y and t, but the labels for the dimension bands are B1 and B2 for the first data cube and B2 and B3 for the second. An overlap resolver is required to resolve overlap in band B2. The merged data cube has the dimensions x, y, t and bands and the dimension bands has three dimension labels: B1, B2, B3.
Data cubes with the dimensions x, y and t have the same dimension labels in x,y and t. There are two options:
1. Keep the overlapping values separately in the merged data cube: An overlap resolver is not needed, but for each data cube you need to add a new dimension using add_dimension(). The new dimensions must be equal, except that the labels for the new dimensions must differ by name. The merged data cube has the same dimensions and labels as the original data cubes, plus the dimension added with add_dimension(), which has the two dimension labels after the merge.
2. Combine the overlapping values into a single value: An overlap resolver is required to resolve the overlap for all pixels. The merged data cube has the same dimensions and labels as the original data cubes, but all pixel values have been processed by the overlap resolver.

jdries commented 4 years ago

Indeed, I'll see if I still have to add something!

m-mohr commented 4 years ago

If not, simply close this issue (or let me do it if you can't).

m-mohr commented 4 years ago

Also, be aware of this proposal (but I guess it doesn't conflict): https://github.com/Open-EO/openeo-processes/commit/88378c65f95e980c91d512686b87ed154f39853a

przell commented 4 years ago

Great advancements! As already mentioned this will also be needed in the Eurac use case. I can't understand from the documentation mentioned by @m-mohr what happens in this case:

We encountered a situation where every time step of a multi-temporal datacube [x, y, time] is to be subtracted from a mono-temporal data cube [x, y]. We couldn't figure out how to achieve this elegantly. What would be the way to go to achieve this?

@jdries had answered it like this:

Could also be seen as an extension of merge_cubes, with subtract as overlaps_resolver, and additionally defining how merging is done when dimensions do not match. (The cube without time dimension would basically take over the time dimension of the other one.)

Is it possible to solve this with the current implementation. Or do we need a workaround?

m-mohr commented 4 years ago

@przell In PR #127 @jdries is working on some clarifications, please let him know what is missing.

Open-EO / openeo-processes

Math Operation on multi-temporal and mono-temporal datacube #96