Open clausmichele opened 2 months ago
In SH itself we follow this approach (we have a harmonization parameter which if set to true will already compensate the offset).
In openEO SH I believe we don't support setting the parameter so we apply the default meaning we request harmonize DN so DN = 10000*Reflectance is still true.
In my feeling this should though not be part of the load_stac process as it's very specific to the case "I load L2A raw DNs but I want to get Reflectance values".
we currently also configure the behaviour per collection, but most of them require the user to explicitly do it, which is annoying. For Sentinel-2 we make sure to convert to the 'standard' scaling factor of 0.0001, to avoid issues with the new processing baseline. It would be nice to have a generic solution for load_collection as well.
yeah, it would be nice if this could be addressed the same way in both load_collection and load_stac.
Side note: I wonder if there isn't a more generic or future-proof parameter name than scale_and_offset
to not be limited to just scale
and offset
transforms. For example in the SH link of Daniel I see clamping of negative values.
For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done? I think I'd assume the same in load_stac by default, if the metadata is given.
For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done? I think I'd assume the same in load_stac by default, if the metadata is given.
Currently it is not specifically mentioned in the load_stac
description. We would have to specify that if the raster extension is available and the scale and/or offset values as well, we apply it. However, I would prefer being able to switch it on/off depending on the use case, since applying it automatically several times changes the data type (like from uint8 to float32 or float64) and requires much more space and resources.
For the load_collection
process, as @m-mohr mentions, even for me it is enough to document it in the metadata to keep it as simple and efficient as possible.
For load_collection the original idea was to have these information in metadata and then apply it automatically during data loading. Is this done?
You mean "apply automatically" by client or backend?
If it's automatically to be done by the backend, what is the point of exposing this as collection metadata? Or worse: you even risk the user/client doing the normalization again because of the misunderstand about getting raw DN values or physical values.
In any case, in the VITO backend we don't automatically normalize/harmonize for memory/performance reasons (e.g. if the raw data is uint8, we want to have the option to keep that type when it's not necessary to convert to more memory-heavy floats/doubles). For example if you download SENTINEL2_L2A (B02, B03, B04) without processing, you get values roughly in the [0-10k] range, instead of reflectances in the [0-1] range. This is obviously a basic behavior we can not change suddenly. Backend-side auto-normalization should be an opt-in feature, e.g. with the proposed scale_and_offset
parameter.
It could also be a client feature (opt-in again) to automatically add a apply
node to do rescaling based on collection metadata.
The load_collection
discussion is a bit off topic in my opinion, let's focus on load_stac
!
@soxofaan doing it client side seems more a workaround to me, since it wouldn't be documented in the openEO processes docs and also not available in the same way in the other clients.
Anyway, if we can't agree to have an additional parameter, we should at least document what is the default behaviour of load_stac
concerning these parameters (which could also be embedded in the geoTIFF metadata, not only in the STAC metadata).
Proposed Process ID: load_stac Proposed Parameter Name: scale_and_offset Optional: yes, default: False
Context
Recently, after the introduction of the new Sentinel-2 processing baseline, an offset has been introduced (additionally to the scale, which was already present). Previously, since the conversion from digital number (DN) (the actual values in the S2 files) to reflectances was performed only by applying the scale factor, for many applications it was the same as using DN (in many indexes, the scale factor is being neglected). Now, both have to be applied in order to obtain meaningful results.
Some GitHub issues discussing about this topic: https://github.com/Element84/earth-search/issues/9 https://github.com/Element84/earth-search/issues/23 https://github.com/opendatacube/odc-stac/issues/55
@jdries @dthiex How do you manage this for the SENTINEL2_L2A collections?
Description
if
scale_and_offset
is True, apply them automatically. They should be available in the raster:bands extension metadata.Data Type
boolean
Additional changes