Closed jacobcook1995 closed 2 years ago
Switching to remote rather than local validation appears to fix the problem. It appears that the definition of the genus Cicada has changed between GBIF releases, at one point it was an accepted taxon with an ID of 10025464. In the current GBIF database Cicada is now a doubtful taxon with an ID of 1682542. I'm guessing this was an error in GBIF which they corrected.
I guess there isn't anything we can do about this, if incorrect GBIF info is provided our trees are going to inevitably be incorrect. Probably worth closing the issue as I can see what we can do. Though it does emphasise the importance of #14, which would have saved me some time in tracking down the bug
I wonder if the issue is in the way deleted taxa are handled in the local database. They have to be added in separately from the main backbone - they aren't included in that core file. The oddity is that from the API, the deleted record only attaches at the Kingdom level (https://api.gbif.org/v1/species/10025464) where the doubtful record hooks in at Family (https://api.gbif.org/v1/species/1682542). It is possible the logic in the local DB handling is preferring the deleted over the doubtful.
Closing this issue as it nows seems to be covered by the new (more accurate) issue #22
Recent uploads of the test dataset (see here) place the genus Cicada as a direct child of Animalia, despite both Arthropoda and Insecta existing in the taxa tree. Suspect this is a problem with taxa.py as
Test_format_good_NCBI.json
for the upload gives the GBIF parent ID of Cicada as 1, i.e. sets the parent as Animalia.This issue doesn't seem to crop up in the datasets uploaded before May 2022 (though there is a very wide gap on the sandbox before that). Probably worth discussing this issue alongside #16 when you are back