Closed DianRHR closed 4 days ago
Test issue_146_1 failed, original issue: https://github.com/CatalogueOfLife/xcol/issues/146 Test issue_146_2 failed, original issue: https://github.com/CatalogueOfLife/xcol/issues/146
The problem persists in the last xrelease in Ancistrocoma, Hypocomella and several genus listed in the task "Identical genus" like Acrochaetium, where the expected behavior would be to merge only the genus from IRMNG (priority over BOLD) and only merge from BOLD the species that are not included in the other sources.
| issue_146_1 | succeded | https://github.com/CatalogueOfLife/xcol/issues/146 | https://www.checklistbank.org/dataset/3LXRC/names?q=Ancistrocoma&rank=genus&sortBy=taxonomic&status=accepted | | issue_146_2 | succeded | https://github.com/CatalogueOfLife/xcol/issues/146 | https://www.checklistbank.org/dataset/3LXRC/names?q=Hypocomella&rank=genus&sortBy=taxonomic&status=accepted |
Acanthosiphonia should be only once, the BOLD genus shouldn't be added as it has the lowest priority
New unit tests for Acrochaetium and Acanthosiphonia
Another example is Crotalaria and almost all the species below are duplicated.
Another example is Aphanizomenon which is provided by ITIS with an outdated classification (in family Nostocaceae); 9 records (species and below) are merging below it from different sources.
And Aphanizomenon which is merging below family Aphanizomenonaceae, both family and genus are merged from WoRMS with just 1 species.
The difference in this case is the higher classification and the way of citing the author: Aphanizomenon Morren, 1888 Ex Bornet & Flahault vs Aphanizomenon Morren ex Bornet & Flahault. ALthough, it is clearly the same author.
The ITIS one is also given with code=bacterial, while Dyntaxa has no code. WoRMS provides the family, but genus and species come from Dyntaxa.
This should clearly not happen. If you look into the build logs I cannot trace what is going on. We do not seem to log all events, looks like the wrong debug level is used. But I can see lots and lots of species like these from NCBI are also dropped, but I suppose that is desired:
2169 Ignore SPECIES Aphanizomenon flos-aquae [1176] because RANK: SPECIES
2169 Ignore SPECIES Aphanizomenon elenkinii Kisselev, 1951 [2651365-s1] because IGNORED_PARENT: 2651365
2169 Ignore SPECIES Aphanizomenon gracile M4/1a [168378-s1] because IGNORED_PARENT: 168378
2169 Ignore SPECIES Aphanizomenon gracile M41/b [168378-s2] because IGNORED_PARENT: 168378
Yes, we only log on debug level in DEV and since we moved to prod forgot to change the setting. I will change that for tomorrows release - the build logs are crucial to understand whats going on, please use them!
I tried to reproduce Aphanizomenon in local tests, but I only get one genus like I would expect.
@DaveNicolson could ITIS adapt the authorship though and use all smaller letter ex
in this and all other cases of these authors? It avoids bad parsing. And placing the year at the end would also be good ;)
According to LPSN is should be:
Aphanizomenon Morren ex Bornet and Flahault 1886
Which are the genera being still duplicated in the latest 2024-10-12 release?
Aphanizomenon looks ok to me. Just a typo for a binomen which will be handled by a new orth var detection feature. Aphanizomenon holsaticum Richter Aphanizomenon holtsaticum Richter
And some NCBI strains that should better be removed in the next release.
I suspect it is the tax group analyzer from the matching that is causing Acrochaetium to fail. The IBOL version of it is placed in Protista and ends up (correctly) as an algae:
kingdom: Protista >phylum: Rhodophyta >class: Florideophyceae >order: Acrochaetiales >family: Acrochaetiaceae >genus: Acrochaetium
PS: Acrochaetium also contains synonym species from TaxRef
= Acrochaetium hirsutum (K.M.Drew) P.W.Gabrielson m [source: 2008] ≡ Chromastrum hirsutum (K.M.Drew) Papenf., 1945 m [source: 2008] ≡ Kylinia hirsuta (K.M.Drew) Kylin, 1944 m [source: 2008] ≡ Rhodochorton hirsutum K.M.Drew, 1928 m [source: 2008]
The BOLD input results in protists
:
https://api.checklistbank.org/parser/taxgroup?q=Acrochaetium&kingdom=Protista&phylum=Rhodophyta&class=Florideophyceae&order=Acrochaetiales&family=Acrochaetiaceae&genus=Acrochaetium
The algae taxgroup dictionaries are very sparse, not even Rhodophyta is known. I will update them based on the 2024 Guiry publication which I uploaded here: https://www.checklistbank.org/dataset/304685/about
Several genus in family Ancistrocomidae (which was wrongly merged in Gentianales) were merged more than once, even if they have the same authority or a third source didn't include the authority.
https://www.checklistbank.org/dataset/299805/classification?taxonKey=CV46W