gbif / occurrence-cube

Workflows for producing species occurrence cubes from GBIF mediated data
Apache License 2.0
5 stars 1 forks source link

Building occurrence cube(s) from the GRIIS Belgium checklist: an edge case? #3

Open damianooldoni opened 1 year ago

damianooldoni commented 1 year ago

Hi πŸ‘‹

INBO colleagues asked me to produce a new version of the cubes I produced for TrIAS specific for the alien species listed in the new version of the GRIIS Belgium checklist. I could do it, but I think it would be nice to test the GBIF cube generation process.

Checklist description

The nubkeys of the GRIIS Belgium checklists are so subdivided (in descending order):

rank taxonomicStatus n
SPECIES ACCEPTED 3573
SUBSPECIES ACCEPTED 146
SPECIES DOUBTFUL 80
VARIETY ACCEPTED 26
GENUS ACCEPTED 25
SPECIES SYNONYM 20
VARIETY SYNONYM 9
SUBSPECIES SYNONYM 6
SPECIES HOMOTYPIC_SYNONYM 5
GENUS DOUBTFUL 4
SPECIES HETEROTYPIC_SYNONYM 4
FAMILY ACCEPTED 1
FORM ACCEPTED 1
FORM SYNONYM 1
SUBSPECIES DOUBTFUL 1

Cubes

To tackle all situations shown in the table above one cube is not enough:

  1. For 91.5% of the taxa it's fine: they are accepted species so occurrences of synonyms and child taxa are lumped. For doubtful species the same holds true I think as doubtful taxa are treated as the accepted ones and a cube at speciesKey level should work fine for them too. So, such a cube covers 3653 taxa, i.e. 93.6% of the checklsit.

  2. For accepted or doubtful subspecies, varieties and forms is also fine as a cube at acceptedKey level lumps synonyms, but not child taxa: subspecies, varieties and forms have no child taxa. Such cube covers ~4.5% of the checklist.

  3. For accepted or doubtulf genuses a cube at genusKey** level is needed. This cube covers 29 taxa, ~0.7% of the checklist.

  4. For accepted families a cube at familyKey is needed.

  5. For synonyms a cube at taxonKey is needed, where no lumping occurs for synonyms and child taxa, following the specs. This is fine as GRIIS Belgium contains synonyms only if taxonomic experts didn't validate the proposed accepted taxon in GBIF Backbone.

So far, so good, we can harvest all occurrences, but we need to produce 5 (!) cubes 😨

Notice that in my aggregation workflow I had to deal with a slightly simpler situation as:

  1. no families were present
  2. I didn't aggregate at genus level (see point 3 above). However, I grouped occurrences at species level also for occurrences of species belonging to a mentioned genus. Still, I dropped out occurrences at genus level if any

My question is: how to deal with this situation? Would/Could GBIF authomatically create these 5 cubes? Would/Could GBIF package them providing a unique DOI to the package? I am pretty sure I am asking about an advanced functionality as such request is not present in the first version of the specs even if people like Tim Adriaens during the kick-off meeting were dreaming to have a easy way to get occurrences (and cubes) based on a checklist.

damianooldoni commented 1 year ago

Two final notes to add to my comment above:

  1. Binding the generated cubes is possible and I think it's the best way to provide the output to the users. We have only to pay attention to rename the columns speciesKey, familyKey, genusKey, acceptedKey and taxonKey. In our aggregation workflow we opted for taxonKey.
  2. Adding information at higher rank for research effort bias correction is still possible following the specs. But the chosen rank should be higher than the highest rank found in the GBIF Backbone taxa pointed by the checklists. In the case discussed above, they would like to have info at class level (classKey) which is a valid option as the highest rank in taxa is FAMILY.
damianooldoni commented 1 month ago

Update: I did this in https://damianooldoni.github.io/b3cubes-sql-examples/cubes_checklist.html#example-5-global-register-of-introduced-and-invasive-species---belgium. I will move all this document (not only the section mentioned above) to B-Cubed tutorials page.