gbif / occurrence

Occurrence store, download, search
Apache License 2.0
22 stars 15 forks source link

Offer occurrence cubes as data product #157

Open gbif-portal opened 4 years ago

gbif-portal commented 4 years ago

Offer occurrence cubes as data product

We just published a preprint (Oldoni et al. 2020) describing a method

... for aggregating species occurrence data into what we coined “occurrence cubes”. The aggregated data can be perceived as a cube with three dimensions - taxonomic, temporal and geographic - and takes into account the spatial uncertainty of each occurrence. The aggregation level of each of the three dimensions can be adapted to the scope. Built on Open Science principles, the method is easily automated and reproducible, and can be used for species trend indicators, maps and distribution models. We are using the method to aggregate species occurrence data for Europe per taxon, year and 1km2 European reference grid, to feed indicators and risk mapping/modelling for the Tracking Invasive Alien Species (TrIAS) project.

We are currently producing these cubes ourselves from GBIF downloads (code is open and referenced in the links below) and publishing the data on Zenodo:

But it would be great if these could be directly offered as a data product by GBIF (with some limited aggregation options, such as by taxonKey or speciesKey, which reference grid to use, etc.) and then also be referenced by their DOI, just like regular GBIF downloads.

@timrobertson100 suggested to add this idea here in this portal-feedback repo once our preprint is out. We are planning to submit a paper for peer review later this year, but I wanted to get the conversation starting already. Also pinging co-authors @damianooldoni and @qgroom


Github user: @peterdesmet User: See in registry System: Chrome 80.0.3987 / Mac OS X 10.14.6 Referer: https://www.gbif.org/user/profile Window size: width 1363 - height 717 API log&_a=(columns:!(_source),index:'prod-varnish-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) Site log&_a=(columns:!(_source),index:'prod-portal-',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'response:%3E499')),sort:!('@timestamp',desc))) System health at time of feedback: INFO

peterdesmet commented 4 years ago

Links that were removed:

MortenHofft commented 4 years ago

Links that were removed

Sorry about that. We had a ton of spam at some point. Perhaps we should allow richer markdown again, we do not get any spam these days.

dschigel commented 4 years ago

This is a very nice idea.

A few opportunities that might help implementation:

jhnwllr commented 4 years ago

I think outside of Europe polygon-based global grids could be a good choice. https://www.discreteglobalgrids.org/

I have already been using them with some success on aggregating GBIF occurrence data. https://github.com/jhnwllr/gbif_shapefile_geocoder

timrobertson100 commented 4 years ago

I had similar thoughts to @jhnwllr that it would be good to allow the user to choose the gridding scheme (e.g. Google S2, UTM-based etc)

qgroom commented 4 years ago

True, but I suspect we need both. The vast majority of people are still working with squares, and all the modelling software is expecting squares. However, for global modelling we do need to start moving towards polygons, so we should start to support them.

peterdesmet commented 4 years ago

Nice! Can you screenshot an example of such a DGG? I can’t see one on the website.

jhnwllr commented 4 years ago

Not on gbif.org. I meant like here https://data-blog.gbif.org/post/exploring-es50-for-gbif/

peterdesmet commented 4 years ago

Aha, so DGGs are ways to divide the earth in equal-sized areas. And those gridding schemes can be referenced. Yes, that would be a good option to offer, in addition to widely used traditional grids like EEA's.