Transforming GBIF Occurrence Cubes

KathiSchleidt commented 9 months ago

As discussed in today's UC synergy meeting, we need to transform GBIF Occurrence Cubes to a grid format, e.g. TIF.

Basic information and a first example dataset are available from the GBIF issue under data requests, dataset here.

Initially we'd hoped to be able to create a dedicated taxa dimension, but as a first step we'll have to split this dataset by taxon, still need to see if we can provide the 2 bands count and uncertainty as bands or as separate TIFs

MarvinMosel commented 9 months ago

Some information on the lat lon information inside the CSV file:

The csv already contains all the information for grid creation: The GRIDNUM (id) specifies the grid size (10m) and the coordinate of the bottom left corner.

For example: the coordinate in EPSG3035 projection for 10mE401832N328626 is: East 401832x10m = 4018320 and North 328626x10m = 3286260 --> N (3286260 meter ) & E (4018320 meter)

You can then build a polygon from the coordinates and convert it into any raster format. In this case, I think the resolution of 10m is a bit too high. The use of a 100m or 1km GRID should be sufficient. The csv files wiht all species in QGIS:

The selected species Vellus vanellus as 1km GRID:

KathiSchleidt commented 9 months ago

Thanks! Resolution and species are up to the UC partner, in this case we requested 10m resolution on the species listed.

@vittekm do you really need 10m or would 100m also work?

What we'd need is step-by-step guidance on how to grid these CSVs, ideally so simple that the UC2 & UC5 partners can do this on their own

vittekm commented 9 months ago

We're working on farm field level, therefore resolution of 10m would be relevant. Indeed knowing that coordinate indicates bottom left corner of cell is crucial for gridding.

MarvinMosel commented 9 months ago

The dataset is an observation dataset and not a distribution dataset - if my understand correctly. This means that the coordinates listed in the csv only indicate that a certain species was seen at this position. Experts later use this to produce the distribution map. --> If there are no observations between two observation points, this does not mean that the bird does not occur there, but has not been sighted. With the distribution map, this area would be filled - if the habitat is suitable for the bird.

I have started to set up a notebook for translating the csv into coordinates: (https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f06_mixed/Specied_distribution_grid.ipynb)

KathiSchleidt commented 9 months ago

For a description of the logic including a paper by B-Cubed partners, see both the BIODIV OCCURRENCE CUBES presentation as well as maybe the minutes of last summers Occurrence Cube Meeting

And yes, the eternal problem with biodiv records as provided by GBIF is that while you have presence, you have no information on absence. Cubing doesn't help us (until the experts then generate full distribution)

vittekm commented 9 months ago

Yes. What comes from GBIF are observations. We already started to make distribution maps based on Dutch NDFF data. Thought we still need to work on improvements including more data and better documentation.

robknapen commented 9 months ago

Isn't the dataset that GBIF provided one where they already processed the species observations into a grid using their cubing algorithm? Then it would be occupancy per grid cel, derived by a statistical method. I would then think that it is best to ingest it using the grid cells (10m) provided. We can later use this occupancy data for species distribution modelling in our use case.

MarvinMosel commented 9 months ago

Ok... let's stick with the 10m data for now. These can then simply be accumulate to a different grid sizes later. I have adapted the notebook so that it now writes 10m vector and raster data sets from the table data. However, I have noticed a small shift in the raster and therefore have to revise the script again.

KathiSchleidt commented 8 months ago

@robknapen @Susannaioni have you had time to test this NB?

robknapen commented 8 months ago

I was kinda waiting for Manual to signal that he finished the revision of the script first.

robknapen commented 8 months ago

Sorry, closed it by accident …

MarvinMosel commented 8 months ago

No, problem.. please check the script and give me your feedback.

MarvinMosel commented 7 months ago

the script is updated and can be tested here: https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f06_mixed/GBIF_occurence_eea_grid.ipynb

KathiSchleidt commented 7 months ago

Thanks!!! Here the URI in a clean form, the one above has the wrong URI underneath: https://github.com/FAIRiCUBE/uc1-urban-climate/blob/master/notebooks/dev/f06_mixed/GBIF_occurence_eea_grid.ipynb