gbif / backbone-feedback

1 stars 0 forks source link

Misinterpretation of Hymenoptera to Mammalia #1

Open CecSve opened 3 weeks ago

CecSve commented 3 weeks ago

A publisher contacted helpdesk because they noticed that their insect records where misinterpreted to Mammalia (example)

They supplied scientificName is Aculeata with taxonRank Infraorder and the backbone use the scientificName to match to the mammalian family Aculeata, thus ignoring the higher taxonomy provided.

Ideally, the backbone should not interpret on a field that is NULL and instead use the lowest level filled from the publisher side (Order Hymenoptera) and then the interpreted value should reflect this change in the taxonRank.

CecSve commented 3 weeks ago

tagging @mdoering as this could maybe be fixed in code?

Mesibov commented 3 weeks ago

@CecSve, I've contacted the collection manager to point out that verbatimEventDate for this record is "7/15/47" and eventDate is "1947-07-12". I suspect the error is because the vED was interpreted by a person entering data, rather than by parsing the vED programmatically and reformatting it.

Mesibov commented 3 weeks ago

@CecSve, maybe I shouldn't have looked... with a quick check I found 400+ disagreements between vED and ED in this dataset, besides the many issues flagged by GBIF, and a pair of duplicates: https://ecdysis.org/collections/individual/index.php?occid=3759197 https://ecdysis.org/collections/individual/index.php?occid=3759202

mdoering commented 3 weeks ago

If I look at our Aculeata mammals family likely all occurrences are misidentified Hymenoptera: https://www.gbif.org/species/6141983

The entire family seems to be a non existing bad one taken from an old PalaeoDB copy. It has been put to rest since then: https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=105145&is_real_user=1

@camiplata @DianRHR this is sth we should add to our test scripts to make sure it does not show up in the xcol. And it would be great to have infraorder Aculeata included so we can organize records to it, but thats probably a base COL problem to discuss.

@CecSve interpreting on a NULL taxonomy field is very important. It is a flat view on a tree. But mixing chordates with insects is a silly behavior.

CecSve commented 3 weeks ago

@CecSve, I've contacted the collection manager to point out that verbatimEventDate for this record is "7/15/47" and eventDate is "1947-07-12". I suspect the error is because the vED was interpreted by a person entering data, rather than by parsing the vED programmatically and reformatting it.

Thank you @Mesibov - I will contact Symbiota Support Hub and ask if they can assist with resolving some of the flags and issues since they host the data.

CecSve commented 3 weeks ago

If I look at our Aculeata mammals family likely all occurrences are misidentified Hymenoptera: https://www.gbif.org/species/6141983

The entire family seems to be a non existing bad one taken from an old PalaeoDB copy. It has been put to rest since then: https://paleobiodb.org/classic/checkTaxonInfo?taxon_no=105145&is_real_user=1

@camiplata @DianRHR this is sth we should add to our test scripts to make sure it does not show up in the xcol. And it would be great to have infraorder Aculeata included so we can organize records to it, but thats probably a base COL problem to discuss.

@CecSve interpreting on a NULL taxonomy field is very important. It is a flat view on a tree. But mixing chordates with insects is a silly behavior.

Thank you for checking @mdoering - I realise not interpreting on a NULL value is not useful in some (maybe most?) cases, but in this case it would maybe make more sense to use the content provided by the publisher, in a field we index.

Until we find another solution, I have proposed to the publisher that they provide the taxonRank Order and Hymenoptera as scientificName as this should resolve the issue. The intermediate ranks can be kept in the higher classification verbatim string.