Open-EO / openeo-api

The openEO API specification
http://api.openeo.org
Apache License 2.0
91 stars 11 forks source link

Make labels available in reduce, apply_dimension etc. #245

Closed m-mohr closed 4 years ago

m-mohr commented 4 years ago

We pass only the data to the callbacks in these functions: aggregate_polygon, aggregate_temporal, apply_dimension, merge_cubes, reduce, resample_cube_temporal. It is useful to also have the labels available, e.g. for the client band math "magic" or more advanced timeseries analysis. We should make the labels available for each value. Could be achieved either with an additional parameter or something like a labeled array data type.

m-mohr commented 4 years ago

Seems to be useful and needs to be explored:

  1. Whether back-ends can actually provide the data (rasdaman may not be able to do it)
  2. How to pass the data to the reducer
m-mohr commented 4 years ago

Telco: It seems useful, let's explore it.

m-mohr commented 4 years ago

Idea

  1. Define a data type "assoc-array" (ordered associative array based on JSON data type array, i.e. a OrderedDict in Python, an associative array in PHP, Map in JS, not sure about Java). Keys (strings or numbers) are dimension labels, values are pixel values. There's no JSON equivalent for this, but I don't think this is an issue. You could have { array: [{a:1}, {b: 2}] } or { array: [ ["a", "b"], [1, 2] ] } or {array: { labels: ["a", "b"], values: [1, 2] } } ... Example (PHP): $data = ["a" => 123, "b" => 567]
  2. Allow easy access to it by extending the from_argument object with an index. This avoids heavy use of array_element or a similar process. Example: {from_argument: "data", index: "a"} to access a in data.
  3. Additionally, either allow array_element to be used on this data type (and objects?) or define separate processes.

This would be backward compatible, I think. Could be supported by from_node, too.

By default index would be set to false so that an array without keys is returned (as it is now, for backward compatibility). Setting index to true returns the full dict. Settings the index to a string or number returns the requested element in the array.

cc @jdries

Example process graph

Changes: https://gist.github.com/m-mohr/ec69ca2fc27a003aa3bd78a8e4b512da/revisions

Before

{
  "dc": {
    "process_id": "load_collection",
    "description": "Loading the data; The order of the specified bands is important for the following reduce operation.",
    "arguments": {
      "id": "Sentinel-2",
      "spatial_extent": {
        "west": 16.1,
        "east": 16.6,
        "north": 48.6,
        "south": 47.2
      },
      "temporal_extent": ["2018-01-01", "2018-02-01"],
      "bands": ["B08", "B04", "B02"]
    }
  },
  "evi": {
    "process_id": "reduce",
    "description": "Compute the EVI. Formula: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + -7.5*BLUE)",
    "arguments": {
      "data": {"from_node": "dc"},
      "dimension": "spectral",
      "reducer": {
        "callback": {
          "nir": {
            "process_id": "array_element",
            "arguments": {
              "data": {"from_argument": "data"},
              "index": 0
            }
          },
          "red": {
            "process_id": "array_element",
            "arguments": {
              "data": {"from_argument": "data"},
              "index": 1
            }
          },
          "blue": {
            "process_id": "array_element",
            "arguments": {
              "data": {"from_argument": "data"},
              "index": 2
            }
          },
          "sub": {
            "process_id": "subtract",
            "arguments": {
              "data": [{"from_node": "nir"}, {"from_node": "red"}]
            }
          },
          "p1": {
            "process_id": "product",
            "arguments": {
              "data": [6, {"from_node": "red"}]
            }
          },
          "p2": {
            "process_id": "product",
            "arguments": {
              "data": [-7.5, {"from_node": "blue"}]
            }
          },
          "sum": {
            "process_id": "sum",
            "arguments": {
              "data": [1, {"from_node": "nir"}, {"from_node": "p1"}, {"from_node": "p2"}]
            }
          },
          "div": {
            "process_id": "divide",
            "arguments": {
              "data": [{"from_node": "sub"}, {"from_node": "sum"}]
            }
          },
          "p3": {
            "process_id": "product",
            "arguments": {
              "data": [2.5, {"from_node": "div"}]
            },
            "result": true
          }
        }
      }
    }
  },
  "mintime": {
    "process_id": "reduce",
    "description": "Compute a minimum time composite by reducing the temporal dimension",
    "arguments": {
      "data": {"from_node": "evi"},
      "dimension": "temporal",
      "reducer": {
        "callback": {
          "min": {
            "process_id": "min",
            "arguments": {
              "data": {"from_argument": "data"}
            },
            "result": true
          }
        }
      }
    }
  },
  "save": {
    "process_id": "save_result",
    "arguments": {
      "data": {"from_node": "mintime"},
      "format": "GTiff"
    },
    "result": true
  }
}

After

{
  "dc": {
    "process_id": "load_collection",
    "description": "Loading the data; The order of the specified bands is important for the following reduce operation.",
    "arguments": {
      "id": "Sentinel-2",
      "spatial_extent": {
        "west": 16.1,
        "east": 16.6,
        "north": 48.6,
        "south": 47.2
      },
      "temporal_extent": ["2018-01-01", "2018-02-01"],
      "bands": ["B08", "B04", "B02"]
    }
  },
  "evi": {
    "process_id": "reduce",
    "description": "Compute the EVI. Formula: 2.5 * (NIR - RED) / (1 + NIR + 6*RED + -7.5*BLUE)",
    "arguments": {
      "data": {"from_node": "dc"},
      "dimension": "spectral",
      "reducer": {
        "callback": {
          "sub": {
            "process_id": "subtract",
            "arguments": {
              "data": [{"from_argument": "data", "index": "B8"}, {"from_argument": "data", "index": "B4"}]
            }
          },
          "p1": {
            "process_id": "product",
            "arguments": {
              "data": [6, {"from_argument": "data", "index": "B4"}]
            }
          },
          "p2": {
            "process_id": "product",
            "arguments": {
              "data": [-7.5, {"from_argument": "data", "index": "B2"}]
            }
          },
          "sum": {
            "process_id": "sum",
            "arguments": {
              "data": [1, {"from_argument": "data", "index": "B8"}, {"from_node": "p1"}, {"from_node": "p2"}]
            }
          },
          "div": {
            "process_id": "divide",
            "arguments": {
              "data": [{"from_node": "sub"}, {"from_node": "sum"}]
            }
          },
          "p3": {
            "process_id": "product",
            "arguments": {
              "data": [2.5, {"from_node": "div"}]
            },
            "result": true
          }
        }
      }
    }
  },
  "mintime": {
    "process_id": "reduce",
    "description": "Compute a minimum time composite by reducing the temporal dimension",
    "arguments": {
      "data": {"from_node": "evi"},
      "dimension": "temporal",
      "reducer": {
        "callback": {
          "min": {
            "process_id": "min",
            "arguments": {
              "data": {"from_argument": "data"}
            },
            "result": true
          }
        }
      }
    }
  },
  "save": {
    "process_id": "save_result",
    "arguments": {
      "data": {"from_node": "mintime"},
      "format": "GTiff"
    },
    "result": true
  }
}
m-mohr commented 4 years ago

This can also be useful for the object-based schema in rename_labels' parameter labels.

m-mohr commented 4 years ago

The subtype labeled-array is now available, which is an array but has labels stored instead indices. Labeled arrays can still be used as normal arrays, so you can pass an labeled array still to mean() for examples, without any change to the process graph. The labels can be accessed with array_* functions, e.g. array_element, array_find and array_labels. Labels take preference over indices.

We don't need a JSON encoding yet. With the changes in #254 to rename_labels, we have no place yet where we need a JSON encoding for labeled arrays in process graphs. So I didn't invent one yet.

The shortcut to access data without array_element, e.g. {from_argument: "data", index: "a"} is not included yet. I guess I'll combine these changes with #161?!