Open-EO / openeo-api

The openEO API specification
http://api.openeo.org
Apache License 2.0
91 stars 11 forks source link

Specify target collection metadata in UDP (& batch jobs) #514

Open jdries opened 9 months ago

jdries commented 9 months ago

2 new use cases came up, with a similar solution:

  1. Our UDP users want to treat a UDP basically as a 'virtual collection'. To allow this, they would like to know the STAC (collection) metadata of the data cube that is generated when the UDP is invoked. Of course, there can be some unknowns, depending on the parameters in the UDP. Some UDP's are fairly constrained, while others can output any raster cube. This case is relevant for the constrained case, where for instance a UDP wants to communicate constraints on the output. Some examples:
    • produces only data over Europe
    • output from 2017 onwards
    • output resolution is 300m
    • output has 4 bands, with detailed band metadata

Note that this collection metadata also acts as a definition of constraints: if the output aoi is europe, than it will probably not accept an input aoi in north america. So UDP tools can use this for input field validation, which is very useful for generic wizards like the openEO editor has.

  1. The second case is perhaps easier to understand: batch jobs try to fill in as much STAC metadata as possible when generating output, but can not know everything. For instance, a job that generated categorical data can not really know which colors would be suited for visualization. As a user, I would like to submit a kind of metadata template in STAC format, so that I can immediately generate output with more complete STAC metadata.

My proposed solution is to simply add a property with the target STAC collection metadata to the UDP and batch job schema:

https://api.openeo.org/#tag/User-Defined-Processes/operation/store-custom-process

I'll probably experiment with this myself, but also wanted to share the idea. These cases are triggered by user projects.

jdries commented 1 month ago

Update for myself: example of metadata that users are asking for.


{
  "geometry" : "τ1{tend=946684800000,tstart=0,ttype=logical}S2(43199,21599){bbox=[-180.0 180.0 -90.0 90.0],proj=EPSG:4326}",

  "metadata" : {
    "im:keywords" : "global, climate, weather, Average temperature",
    "dc:comment" : "This is WorldClim version 2.1 climate data for 1970-2000. This version was released in January 2020.\r\nThere are monthly climate data for average temperature (°C).\r\nThe data is available at 30 seconds (~1 km2).\r\nFor \"time\", the month scope is inside the semantics data annotation",
    "im:notes" : "",
    "dc:title" : "WorldClim Historical climate data version 2.1 data 30s for 1970-2000 average temperature January",
    "dc:url" : "https://worldclim.org/data/worldclim21.html",
    "dc:creator" : "",
    "im:thematic-area" : "Earth",
    "dc:originator" : "Worldclim",
    "im:geographic-area" : "Global",
    "dc:source" : "Fick, S.E. and R.J. Hijmans, 2017. Worldclim 2: New 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology 37(12):4302-4315."
  },

}