gbif / portal-feedback

User feedback for the GBIF API, website and published data. You can ask questions here. 🗨❓
30 stars 16 forks source link

Filter species occurrence data by taxa in a taxon checklist dataset #5157

Open dagendresen opened 8 months ago

dagendresen commented 8 months ago

A somewhat frequent request to the Norwegian node is to filter and download species occurrence data for all taxa in a taxon checklist dataset.

Researchers and students working with crop wild relatives (CWRs) are asking for how to filter their GBIF occurrence search by the taxa included in the national or regional crop wild relative taxon list (or a CWR priority list) published to GBIF as taxon checklist datasets. Other user groups such as the marine researchers and students ask to filter their occurrence download by taxa listed in WoRMS, etc... This is also a feature we desire for the Biodiversity Digital Twin crop wild relative use case.

Our current workflow is to: (1) download the taxon checklist dataset, (2) extract the list of GBIF taxonKey IDs matching the GBIF taxon backbone, (3) split the list of taxonKeys into sublists so that the total number of characters in the search string is below the GBIF REST API limit of 12000 characters, (4) place a series of GBIF API calls to download occurrence data for each sub-list of taxonKeys, (5) merge search results back into a combined list of species occurrence, and (6) use the derived datasets approach to merge the GBIF download DOIs into one GBIF DOI we can cite.

We believe that filter and download of species occurrence data by taxon checklist has wide utility. Including e.g. the request by the Tracking Invasive Species (TriAS) project described in issue 1768.

ManonGros commented 8 months ago

Hi @dagendresen, are you using the occurrence search API or the occurrence download API? The Nordic Crop Wild Relative (CWR) Checklist has less than 100,000 taxa, it shouldn't be a problem to get everything in one download query. See also: https://data-blog.gbif.org/post/downloading-long-species-lists-on-gbif/

EstebanMH-SiB commented 2 months ago

From Colombia we think this will be a needed and useful feature. People with some knowledge can do the API call, but it will be extremely useful for the normal user of either GBIF portal and our national portal be able to do this with a filter.

We have a need to have a filter for invasive species and this can be a way to do it and solve more use cases as @dagendresen points