FAIRiCUBE / data-requests

Request data to be made available within FAIRiCUBE HUB
2 stars 0 forks source link

ChangeType.update stac_dist/ADC_arable_land_markers_autumn/ADC_arable_land_markers_autumn.json #287

Open fairicube-data opened 3 months ago

fairicube-data commented 3 months ago

{"filename": "ADC_arable_land_markers_autumn/ADC_arable_land_markers_autumn.json", "item_type": "stac_dist", "change_type": "Update", "user": "FAiRICUBE", "data_owner": true}

misev commented 3 months ago

@robknapen is there any webpage/documentation about the data?

I see it has 5 floating-point bands, but in your data request they are not described. According to the file metadata they correspond to different years?

  BAND_1=2018
  BAND_2=2019
  BAND_3=2020
  BAND_4=2021
  BAND_5=2022
robknapen commented 3 months ago

@misev The bands should represent different years indeed. @vittekm will know, he is pre-processing the data and filling in the data requests. From what I understood from him he is merging yearly data into a big file with bands per year, because it will be easier for the ingestion? Anyway, best to let him answer further questions :)

misev commented 3 months ago

For us it's best to have separate files per year, but it's not a hard requirement as we can separate them before the ingest.

vittekm commented 3 months ago

@misev In total we have 10 files in series "Agrodatacube" (ADC) to be ingested. Currently 6 files has 5 bands corresponding to years 2018 - 2022 and 4 files has 9 bands corresponding to years 2014 - 2022. My idea was then submit each file as different thematic dataset with multiple years as separate request. Would that work?

misev commented 3 months ago

It sounds like this could be two datacubes: 2018-2022 with 6 bands, and 2014-2022 with 4 bands? So you could make two requests for datacubes ADC_2018_2022 and ADC_2014_2022 or so.

misev commented 3 months ago

If possible it might be better to not hardcode the years in the datacube IDs, so they can be extended in future if needed for further years. Some other suffix for a distinguishing feature would be better in the name.

vittekm commented 3 months ago

In fact it is 6 x 5 bands file and 4 x 9 bands file as following:

File Bands ADC_arable_land_markers_autumn 5 ADC_arable_land_markers_no_ndvi 5 ADC_arable_land_markers_spring 5 ADC_crop_rotation_index 9 ADC_crop_parcels_crop_code 9 ADC_crop_parcels_field_id 9 ADC_crop_parcels_land_use 9 ADC_grassland_markers_ndvi_spring 5 ADC_grassland_markers_no_mowing 5 ADC_grassland_markers_no_ndvi 5

Could it be then 10 datacubes?

misev commented 3 months ago

It could be 10 datacubes, it will just be a bit of extra work to fill in 10 data requests in the catalog editor.

Alternatively if the 6 x 5 bands files are same resolutions and CRS, they could be a single datacube. This will save us adding so many different catalog entries. So there would be one datacube e.g. ADC_arrable_land_and_grassland_markers with bands:

misev commented 3 months ago

Or maybe 3 datacubes: ADC_arable_land, ADC_crop, ADC_grassland_markers. Up to you, all these options are possible.

vittekm commented 3 months ago

In that case we still need distinction between years:

  1. ADC_arrable_land_and_grassland_markers

Could that still fit within 2 or 3 requests just simply upload multiple files (with same number of bands reprsenting years)?

misev commented 3 months ago

@vittekm do you create this data? I have some suggestions:

  1. it's uncommon to have bands represent years, it's better to split the bands into separate files per year
  2. it's uncommon to have categorial data with float64 data type, especially with just a few classes it's better to export the data to byte/int8 data type

Let's start with one datacube ADC_arable_land_markers which would contain data for years 2018-2022, and bands

Then you could zip all files needed to build this datacube and make that available for download.

What do you think?

vittekm commented 3 months ago

@misev I see. It means that out of all individual multiband files should be created individual files representing years with tag (e.g. _2018.tif). Then zipped together and submit request for ingest to create a cube further. Actually, I thought that this separatio could be done easier (or quicker) after ingestion with data as they are. Otherwise I could do this preparation locally.

misev commented 3 months ago

@misev I see. It means that out of all individual multiband files should be created individual files representing years with tag (e.g. _2018.tif).

Yes exactly, this is the usual way as TIFF is a 2D image format and putting multiple years in a single file is just surprising.

robknapen commented 3 months ago

@vittekm @misev Maybe better to use slightly more descriptive names for the bands, than just 'autumn' and 'spring'? I think for arable field markers these represent categorical states (bare, green, unknown?). While for grasslands it indicates usage intensity in Spring, on a scale of [0.0 - 1.0]? But please double check.

misev commented 3 months ago

@robknapen absolutely agreed, these details should be captured as completely as possible in the metadata entry in the catalog, describing what the pixel values represent for each band.

Screenshot_20240716_160858

vittekm commented 3 months ago

@misev @robknapen More descriptive name could be e.g. field conditions. But then bare, green, unknown are actual pixel values. Thes should be indeed filled in metadata entry. I wanted to include it in description but it's good to have indeed designated fields in catalog. Following then should be entered (in english) plus values of continuos variables:

bouwland_markers_najaar 1.tif Categories:(1='onbekend', 2='winter groen' and 3='winter kaal'), NoData=0, TimeExt=2018-2022 bouwland_markers_no_ndvi_img 1.tif NoData=-1, TimeExt=2018-2022, TimeExt=2018-2022 bouwland_markers_voorjaar 1.tif Categories:(1='onbekend', 2='winter groen' and 3='winter kaal'), NoData=0, TimeExt=2018-2022 crop_rotation_index 3.tif NoData=-1, TimeExt=2014-2022 gewaspercelen_crop_code.csv gewaspercelen_crop_code.tif Categories:(file above), NoData=-1, TimeExt=2014-2022 gewaspercelen_fieldid.tif NoData=-1, TimeExt=2014-2022 gewaspercelen_grondgebruik.tif Categories:(1='Bouwland', 2='Braakland', 3='Grasland', 4='Natuurterrein' and 5='Overige'), NoData=-1, TimeExt=2014-2022 grasland_markers_ndvi_voorjaar 1.tif NoData=-1, TimeExt=2018-2022 grasland_markers_no_maai 1.tif NoData=-1, TimeExt=2018-2022 grasland_markers_no_ndvi_img 1.tif NoData=-1, TimeExt=2018-2022

misev commented 3 months ago

@vittekm yes sounds good, I'd suggest to just go ahead and update the data request in the catalog editor once you make the data available for download.

robknapen commented 3 months ago

@vittekm to me it would be fine (for now) to leave out these two datasets:

They contain the number of NDVI "images" that were used to derive the field markers. It can give an expert that knows how the data is derived some guesstimate about the quality. To make it easier for non-expert users of the data I think we should pre-process it (make it more analysis ready) ourselves with some simple rules, e.g. if (no_ndvi_img < 2) then arable_land_autumn_condition = "unknown". The threshold value (e.g. 2) can be estimated by looking at the complete dataset, I don't know the actual range of these variables. It is easier if we do this than that a user has to figure it out (which probably nobody will put effort in).