Closed kvantricht closed 1 month ago
@VincentVerelst this is a task that will allow follow-up work such as feature computation to kickstart much faster because we don't need to wait massive extraction work. Let's have a chat on this one whenever you're up for that.
TBD with @GriffinBabe how to get the mask_scl_dilation
mask. We can't get this for existing extractions. So maybe need to rethink if we want to produce this in a separate job that can then also run on existing extractions.
The folder structure we want to implement is marked here @VincentVerelst. Tell me if you have access:
A more detailed overview of what needs to be done:
@kvantricht , in the WC1 sentinel-2 extractions there is no cloud mask layer, nor distance to cloud. Do they also have to be added when converting to the WC2 format? If so, we need to think about a strategy to do this.
Good question. There is a SCL band for sure, but indeed, distance to clouds (and mask_scl_dilation mask) is not something that is or will be available. I guess we need to think about a way to compute the cloud mask and distance to cloud separately and do this for these converted extractions. With GFMAP jobs. Would that be something we can do? cc @GriffinBabe
Won't fix
WorldCereal did already a lot of extractions in its first phase, mostly using Google Earth Engine. These extractions are available as NetCDF files. We don't want to waste time and resources starting over all these extractions, so what needs to be done is converting these into the format that new GFMAP-based extraction workflow produces, after which they can be ingested in the STAC catalogue and it will appear as if they were part of the new extractions workflow.
Phase I extractions location:
/vitodata/worldcereal_data/cib/CIB_V1
database.json
->GeoPandas.GeoDataframe
with rows of extracted samples and attributes pointing to NetCDF files on disk