Open-EO / openeo-gfmap

Generic framework for EO mapping applications building on openEO
Apache License 2.0
4 stars 0 forks source link

Run first demo WorldCereal-based extraction use case #43

Closed kvantricht closed 3 months ago

kvantricht commented 4 months ago

Using demo dataset from #40 and writing documentation along the way in #42 .

kvantricht commented 4 months ago

@GriffinBabe, my initial review of test extractions:

S2 file “S2_at2021lpis809720_32633_2020-08-30_2022-03-03”

AUX file “AUX_at2021lpis809720_32633_2020-08-30_2022-03-03”

GriffinBabe commented 4 months ago

Already an answer to a couple of your points, I'm running a new extraction with some changes before answering to the others

• What rule was used (if any) to drop observations that are certainly completely clouded?

No strategy currently present, I think an easy one would be to drop observations with a cloud percentage higher than, let's say, 95% ? This can be done directly from the OpenEO load_collection feature

• SCL band is also INT16: could we in a post-job action change this to UINT8 to save storage or would that be too complex and not really worth it?

Probably not worth it, because you might save only half the space of one on the 13 bands. Also it's nice for later processing to have everything aligned on the same resolution, since anyways SCL is gonna be used for pixelwise operations on the optical data

• File name pattern should probably contain the resolution (and maybe patch size? Not sure).

I don't think it's necessary as all the tiles should be consistent within an extraction process

• Where does the attribute “landcover_label” come from?

It's something from the input dataset, but I can remove this because it's almost the same value for each sample

• Is it normal that “confidence” attribute is None?

It's also extracted from the input dataset, and it's None already from there for all samples

• Are we sure UINT16 covers the entire possible range of values?

Could you indicate me where is the list of the harmonized labels? This is decided by the user, but in the context of WorldCereal it should be matching the harmonized labels maximum value. Alternatively we can also put in to int64, which will accept negative values and can go on a very high range, it's still remaining a small file as such as it is only a 2D array (1 band, no time)

• There is no no-data value specified in the file. Which one do you use in rasterization process? Should be encoded as such in the netcdf so it’s handled properly downstream.

Yes indeed, I will add this as _FillValue as it seems to be the convention with NetCDF files (this source and in my past experiences)

GriffinBabe commented 4 months ago

Another question: how do you want the rasterization? With the all_touched parameter set to True or to False? https://rasterio.readthedocs.io/en/stable/api/rasterio.features.html#rasterio.features.rasterize

kvantricht commented 4 months ago

No strategy currently present, I think an easy one would be to drop observations with a cloud percentage higher than, let's say, 95% ? This can be done directly from the OpenEO load_collection feature

As a minimum, indeed good to have this added. We could make it configurable in GFMAP?

I don't think it's necessary as all the tiles should be consistent within an extraction process

Not sure about this, I think we would prefer to extract S1 at 20m and meteo even at much lower resolution. This saves a lot of storage. Downstream OpenEO-based processing will finally merge the cubes.

It's something from the input dataset, but I can remove this because it's almost the same value for each sample

Should not always be the same. 11 means cropland, but there will be other datasets (not the one I sent) which have other landcovers as well. But the crucial thing here is that this attribute only belongs to the center point/field we used and it doesn't necessarily fit the rest of the pixels in the rasterized patch. So I would omit this attribute as it will be confusing.

It's also extracted from the input dataset, and it's None already from there for all samples

Interesting. I guess once we use the API, this shouldn't be None. So let's worry about it later.

Could you indicate me where is the list of the harmonized labels? This is decided by the user, but in the context of WorldCereal it should be matching the harmonized labels maximum value. Alternatively we can also put in to int64, which will accept negative values and can go on a very high range, it's still remaining a small file as such as it is only a 2D array (1 band, no time)

The possible values are in the first column of this file. Note that this is a string for readability and we should strip all - signs to get to the integer. I think int64 is indeed what is normally used for the new legend.

Another question: how do you want the rasterization? With the all_touched parameter set to True or to False?

As discussed, all_touched should be False

kvantricht commented 4 months ago

@GriffinBabe

GriffinBabe commented 3 months ago

Issue reported: https://github.com/Open-EO/openeo-geopyspark-driver/issues/712

GriffinBabe commented 3 months ago

Except for the scl problem, all is solved here. I created a ticket in WorldCereal technical and open-eo geopyspark driver https://github.com/Open-EO/openeo-geopyspark-driver/issues/715

45 closes it