Building occurrence cube(s) from the GRIIS Belgium checklist: an edge case?

damianooldoni commented 1 year ago

Hi 👋

INBO colleagues asked me to produce a new version of the cubes I produced for TrIAS specific for the alien species listed in the new version of the GRIIS Belgium checklist. I could do it, but I think it would be nice to test the GBIF cube generation process.

Checklist description

The nubkeys of the GRIIS Belgium checklists are so subdivided (in descending order):

rank	taxonomicStatus	n
SPECIES	ACCEPTED	3573
SUBSPECIES	ACCEPTED	146
SPECIES	DOUBTFUL	80
VARIETY	ACCEPTED	26
GENUS	ACCEPTED	25
SPECIES	SYNONYM	20
VARIETY	SYNONYM	9
SUBSPECIES	SYNONYM	6
SPECIES	HOMOTYPIC_SYNONYM	5
GENUS	DOUBTFUL	4
SPECIES	HETEROTYPIC_SYNONYM	4
FAMILY	ACCEPTED	1
FORM	ACCEPTED	1
FORM	SYNONYM	1
SUBSPECIES	DOUBTFUL	1

Cubes

To tackle all situations shown in the table above one cube is not enough:

For 91.5% of the taxa it's fine: they are accepted species so occurrences of synonyms and child taxa are lumped. For doubtful species the same holds true I think as doubtful taxa are treated as the accepted ones and a cube at speciesKey level should work fine for them too. So, such a cube covers 3653 taxa, i.e. 93.6% of the checklsit.
For accepted or doubtful subspecies, varieties and forms is also fine as a cube at acceptedKey level lumps synonyms, but not child taxa: subspecies, varieties and forms have no child taxa. Such cube covers ~4.5% of the checklist.
For accepted or doubtulf genuses a cube at genusKey** level is needed. This cube covers 29 taxa, ~0.7% of the checklist.
For accepted families a cube at familyKey is needed.
For synonyms a cube at taxonKey is needed, where no lumping occurs for synonyms and child taxa, following the specs. This is fine as GRIIS Belgium contains synonyms only if taxonomic experts didn't validate the proposed accepted taxon in GBIF Backbone.

So far, so good, we can harvest all occurrences, but we need to produce 5 (!) cubes 😨

Notice that in my aggregation workflow I had to deal with a slightly simpler situation as:

no families were present
I didn't aggregate at genus level (see point 3 above). However, I grouped occurrences at species level also for occurrences of species belonging to a mentioned genus. Still, I dropped out occurrences at genus level if any

My question is: how to deal with this situation? Would/Could GBIF authomatically create these 5 cubes? Would/Could GBIF package them providing a unique DOI to the package? I am pretty sure I am asking about an advanced functionality as such request is not present in the first version of the specs even if people like Tim Adriaens during the kick-off meeting were dreaming to have a easy way to get occurrences (and cubes) based on a checklist.

damianooldoni commented 1 year ago

Two final notes to add to my comment above:

Binding the generated cubes is possible and I think it's the best way to provide the output to the users. We have only to pay attention to rename the columns speciesKey, familyKey, genusKey, acceptedKey and taxonKey. In our aggregation workflow we opted for taxonKey.
Adding information at higher rank for research effort bias correction is still possible following the specs. But the chosen rank should be higher than the highest rank found in the GBIF Backbone taxa pointed by the checklists. In the case discussed above, they would like to have info at class level (classKey) which is a valid option as the highest rank in taxa is FAMILY.

damianooldoni commented 1 month ago

Update: I did this in https://damianooldoni.github.io/b3cubes-sql-examples/cubes_checklist.html#example-5-global-register-of-introduced-and-invasive-species---belgium. I will move all this document (not only the section mentioned above) to B-Cubed tutorials page.

gbif / occurrence-cube

Building occurrence cube(s) from the GRIIS Belgium checklist: an edge case? #3

Checklist description

Cubes