Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

job-option to save results as one asset per band #309

Open VictorVerhaert opened 1 week ago

VictorVerhaert commented 1 week ago

For large projects, where the output is expected to be one file per s2 tile, having all the bands in one file would create too large files. Thus the need for a job option to save the results in different files per band, as is often the case for other large products like the copernicus HRL VPP products. Different assets should still be included in one stac item if possible.

Implementation tasks:

Scala implementation

I think the best place to start from is the code that now creates a tiff per date: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/83e9b63e699b6196d60c96257d7212cd89b972dd/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/geotiff/package.scala#L95

In the first 'map' where we compress tiles, the bandIndex will always be 0, but we will have to add the band name to the key, which is used to group the tuples in the rdd later on.

By including the band name in the rdd key, we will get compressed bytes per date per band, and can simply write those into a geotiff in a way that is almost identical to the current approach, except for generating the right filename of course.

JorisCod commented 1 week ago

As an additional requirement, it should be possible to define the filename through e.g. prefix and postfix, having the bandname in between. Alternatively, some kind of f-string is also an option.

EmileSonneveld commented 1 week ago

I like Joris his suggestion. As I understand: file_name="openEO_{date}.tiff" would be the behavior we have now file_name="openEO_{date}_{band}.tiff" would give a series of simple tiffs with only one band inside file_name="openEO_{band}.tiff" would give a tiff for each openeo-band, with every tiff-band representing a timestamp (What Victor wants in this case)

jdries commented 1 week ago

From what I understand, we want this: file_name="openEO_{date}_{band}.tiff" would give a series of simple tiffs with only one band inside multiple timestamps in single tiff is not supported by standard geotiff tooling (gdal) and can not be described easily in stac so we prefer to stay away from that, especially for lcfm

VictorVerhaert commented 1 week ago

The yearly and seasonal features don't require timestamps.

For the Monthly composites we might have to consider if creating a tiff per bands per month per tile might result in to many files. (Hence the idea of having a band per "timestamp", the timestamp being a monthly composite)