Open soxofaan opened 5 months ago
I'm not really sure if netcdf will be better, especially because writing a single large netcdf is not so easy, whereas geotiff can write multiple files in parallel. The only other format with some potential for this use case is Zarr, again because of the parallel write possibility.
A reason to prefer NetCDF is that it is more standardized to handle multidimensional cases (e.g. encode time dimension). With GTiff we do encoding of time dimension in a more ad-hoc way, so that will not scale well if more backend implementations come in play.
But indeed, this is not an urgent matter at this time
STAC + geotiff can fully define a datacube with time dimension in a standardized manner. In fact, the stac metadata becomes more complicated for netcdf with time dimension. I've also seen other backends write netcdf output in rather unexpected ways that we would probably not support on our side.
I noticed this while looking into https://github.com/Open-EO/openeo-geopyspark-driver/issues/786 related issue:
the crossbackend feature in aggregator currently uses GTiff for the
load_stac
bridge:https://github.com/Open-EO/openeo-aggregator/blob/129d4f27ebf762c737d9b5229b88b6b49d1d9610/src/openeo_aggregator/partitionedjobs/crossbackend.py#L133-L141
If I remember correctly we picked that at the time of implementation, because it's a safe choice (widely supported) and there were issues with NetCDF support in
load_stac
in openeo-geopyspark-driver at the time (March 2023).We might want to revisit the situation e.g. automatically detect a better option? let user choose in some way?