CatalogueOfLife / xcol

Working towards the extended Catalogue of Life Checklist
0 stars 0 forks source link

'Unplaced' names at different ranks (genus, species or subspecies) were not merged properly into xCOL #100

Open DianRHR opened 7 months ago

DianRHR commented 7 months ago

There are several subspecies that were not merged properly into the xCOL even though some other subspecies from the same genus and source were merged correctly:

Here is the example of genus Mycotretus:

image

Looking for the genus Mycotretus in the XCOL-2023-10-26 found this:

https://www.dev.checklistbank.org/dataset/271349/classification?taxonKey=5W6T image

Besides, other species were merged more than once, depending on the number of subspecies related: https://www.dev.checklistbank.org/dataset/271349/classification?taxonKey=3349c2de-e025-4ab9-89a8-194d32b30a32

image

Or were merged more than once, one with author and one without author: https://www.dev.checklistbank.org/dataset/271349/classification?taxonKey=e4aabe0e-9a9d-49eb-a400-0492fc8c77bd image

mdoering commented 7 months ago

Mycotretus discipennis subsp. conductus Kuhnt, 1910 for example exists twice: https://www.dev.checklistbank.org/dataset/271349/taxon/~4Gik https://www.dev.checklistbank.org/dataset/271349/taxon/~4Glb

And occurs twice in the Plazi source, once accepted, once as a synonym: https://www.dev.checklistbank.org/dataset/56536/duplicates?limit=50&rank=subspecies

mdoering commented 7 months ago

I believe all the did not merge problems are due to missing name matches! There was a very hard to spot bug in the rematch function that prevented most matches to persist.

DianRHR commented 7 months ago

And occurs twice in the Plazi source, once accepted, once as a synonym: https://www.dev.checklistbank.org/dataset/56536/duplicates?limit=50&rank=subspecies

Looking at the article, found that Mycotretus discipennis conductus Kuhnt, 1910 [as a variety] is synonym of Mycotretus deyrollei Crotch, 1876 AND synonym of Mycotretus discipennis conductus Kuhnt, 1910 (subspecies) . So ... the original info is that there is the same name trinomial with different rank, but also ... the "variety" points to different accepted names. These kind of errors are out of our hands.

image

DianRHR commented 7 months ago

Besides, other species were merged more than once, depending on the number of subspecies related: https://www.dev.checklistbank.org/dataset/271349/classification?taxonKey=3349c2de-e025-4ab9-89a8-194d32b30a32

same as reported in #87

DianRHR commented 7 months ago

In the XCOL-2023-11-20 and XCOL-2023-11-29 some Genus were merged just below Biota, even that the original source includes higher taxonomy, and the family was merged correctly (as other genus from the same family). Example: https://www.dev.checklistbank.org/dataset/274825/classification?taxonKey=~OZO

image

DynTaxa (original source) includes the genus with complete higher taxonomy: https://www.dev.checklistbank.org/dataset/2041/taxon/urn%3Alsid%3Adyntaxa.se%3ATaxon%3A1015231

image

However, the Family Pelonematacee was merged properly in XCOL (and it comes from the same source): https://www.dev.checklistbank.org/dataset/274825/classification?taxonKey=~1eks

image

DianRHR commented 1 week ago

The cases of Peloploca and Pelonema are solved and properly merged.

The case of subspecies of Mycotretus is not possible to confirm until we merge Plazi datasets.

However, some genus are merging below the kingdom level because the original source either doesn't include higher taxonomy or it is different from the base COL. Some of these cases are also generating duplicates:

Example: Genus Cribraria image was merged below Fungi (merged from Brazilian Flora), even though it was already in the baseCOL below Cribrariales | Cribrariaceae and in both cases have the same author. Besides, all the species merged below Cribaria (at the lingdom level) are duplicated as well.

A possible solution for this cases could be to modify the code considering: avoid merging if it's the same genus (and author) but with different higher taxonomy, descendants could be merged if not present already.