Open-EO / openeo.org

openeo.org landing page
https://openeo.org
Apache License 2.0
6 stars 16 forks source link

Add best practice how to store results #8

Closed m-mohr closed 4 years ago

m-mohr commented 4 years ago

Based on issue https://github.com/Open-EO/openeo-processes/issues/59 and a recent discussion:

Currently, users have no knowledge of how many files they will receive (and what exactly will be in it/them) after using save_result, as the behavior of this process is left to the back-end.

Let's assume that the job output is composed by more than one raster, i.e. it is a datacube with dimensions (x,y,A,B, ...) where dimensions A, B, ... have more than one element (a typical example is a timeseries with multiple bands). We agreed on the following:

  • the back-end freely decides how to save and return job outputs, but it MUST expose on /output_formats information on how datacubes with mutliple dimensions will be offered to the user (e.g. one file per timestamp per band; one file per timestamp with n bands in each, etc ...)
  • generally we define best-practice to save one file per timestamp (each file may contain multiple bands, or whichever other dimension)

Quote from @lforesta

m-mohr commented 4 years ago

Is there still a desire to work on this @lforesta?

lforesta commented 4 years ago

Thanks for the reminder :) 'desire' is a big word but I think it would be useful if backends add this information for users. My idea would be to have an additional field at GET /file_formats under output, maybe called output-structure. Depending how strict we want to be we can make this field non-nullable.

If the idea sounds fine I can put together a PR

m-mohr commented 4 years ago

API changes are bit harder to do nowadays. They can't be breaking and we won't release a new version in the openEO project runtime any more. So we only have the description field for now.

I think at some point we discussed to make this only a best practice (that's why it's likely an issue at openeo.org, no openeo-api). In this case we just need a write-up on how file should be stored/made available by back-ends. Or is this not required any longer as we have STAC now for results, which can transfer a lot of additional information per asset/file?

lforesta commented 4 years ago

Yes actually now this information may be provided by the backend in properties/description when sensing a request to GET /jobs/{job_id}/results, so no need to add that to openeo.org I think

m-mohr commented 4 years ago

And we may even extend STAC with more information, for example it can describe the bands and gsd per asset already and it is planned to add information such as datatype, min/max values etc. So I think we are fine with that approach. Even better :-)

clausmichele commented 3 years ago

After trying the sample process graph with climatological_normal, I've noticed that in the process graph there's save_result with the palette option, how do you actually apply this? I do not understand what is mapping with 0 -> #2166AC 1 -> #4393C3 . Is the value 0 in the result mapped to the color #2166AC? What to do with values which are not mapped with a color?

image

m-mohr commented 3 years ago

@clausmichele This is earthengine driver related and needs to be discussed there.