Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 14 forks source link

load_stac: scale_and_offset #503

Open clausmichele opened 2 months ago

clausmichele commented 2 months ago

Proposed Process ID: load_stac Proposed Parameter Name: scale_and_offset Optional: yes, default: False

Context

Recently, after the introduction of the new Sentinel-2 processing baseline, an offset has been introduced (additionally to the scale, which was already present). Previously, since the conversion from digital number (DN) (the actual values in the S2 files) to reflectances was performed only by applying the scale factor, for many applications it was the same as using DN (in many indexes, the scale factor is being neglected). Now, both have to be applied in order to obtain meaningful results.

Some GitHub issues discussing about this topic: https://github.com/Element84/earth-search/issues/9 https://github.com/Element84/earth-search/issues/23 https://github.com/opendatacube/odc-stac/issues/55

@jdries @dthiex How do you manage this for the SENTINEL2_L2A collections?

Description

if scale_and_offset is True, apply them automatically. They should be available in the raster:bands extension metadata.

Data Type

boolean

Additional changes

dthiex commented 1 month ago

In SH itself we follow this approach (we have a harmonization parameter which if set to true will already compensate the offset).

In openEO SH I believe we don't support setting the parameter so we apply the default meaning we request harmonize DN so DN = 10000*Reflectance is still true.

In my feeling this should though not be part of the load_stac process as it's very specific to the case "I load L2A raw DNs but I want to get Reflectance values".

jdries commented 1 month ago

we currently also configure the behaviour per collection, but most of them require the user to explicitly do it, which is annoying. For Sentinel-2 we make sure to convert to the 'standard' scaling factor of 0.0001, to avoid issues with the new processing baseline. It would be nice to have a generic solution for load_collection as well.

soxofaan commented 1 month ago

yeah, it would be nice if this could be addressed the same way in both load_collection and load_stac.

Side note: I wonder if there isn't a more generic or future-proof parameter name than scale_and_offset to not be limited to just scale and offset transforms. For example in the SH link of Daniel I see clamping of negative values.

m-mohr commented 1 month ago

For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done? I think I'd assume the same in load_stac by default, if the metadata is given.

clausmichele commented 1 month ago

For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done? I think I'd assume the same in load_stac by default, if the metadata is given.

Currently it is not specifically mentioned in the load_stac description. We would have to specify that if the raster extension is available and the scale and/or offset values as well, we apply it. However, I would prefer being able to switch it on/off depending on the use case, since applying it automatically several times changes the data type (like from uint8 to float32 or float64) and requires much more space and resources.

For the load_collection process, as @m-mohr mentions, even for me it is enough to document it in the metadata to keep it as simple and efficient as possible.

soxofaan commented 1 month ago

For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done?

You mean "apply automatically" by client or backend?

If it's automatically to be done by the backend, what is the point of exposing this as collection metadata? Or worse: you even risk the user/client doing the normalization again because of the misunderstand about getting raw DN values or physical values.

In any case, in the VITO backend we don't automatically normalize/harmonize for memory/performance reasons (e.g. if the raw data is uint8, we want to have the option to keep that type when it's not necessary to convert to more memory-heavy floats/doubles). For example if you download SENTINEL2_L2A (B02, B03, B04) without processing, you get values roughly in the [0-10k] range, instead of reflectances in the [0-1] range. This is obviously a basic behavior we can not change suddenly. Backend-side auto-normalization should be an opt-in feature, e.g. with the proposed scale_and_offset parameter.

It could also be a client feature (opt-in again) to automatically add a apply node to do rescaling based on collection metadata.

clausmichele commented 1 month ago

The load_collection discussion is a bit off topic in my opinion, let's focus on load_stac!

@soxofaan doing it client side seems more a workaround to me, since it wouldn't be documented in the openEO processes docs and also not available in the same way in the other clients.

Anyway, if we can't agree to have an additional parameter, we should at least document what is the default behaviour of load_stac concerning these parameters (which could also be embedded in the geoTIFF metadata, not only in the STAC metadata).