CatalogueOfLife / data

Repository for COL content
7 stars 2 forks source link

Weird name parsing in Sepidiini tribe #630

Open aoern opened 6 months ago

aoern commented 6 months ago

Same problem as in #628. In Sepidiini tribe, author name "des Desbrochers des Loges" causes troubles in 4 occurrences. The first "des" is parsed as a part of name: Sepidium capricorne des | Desbrochers des Loges, 1881

mdoering commented 6 months ago

That is a common problem case unfortunately. It seems there is no epithet called des though, so I could exclude it from being recognised as such: https://www.checklistbank.org/dataset/288943/names?content=SCIENTIFIC_NAME&facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nomCode&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&facet=sectorMode&limit=50&offset=0&q=des&sortBy=taxonomic

mdoering commented 6 months ago

Hm, the parser actually handles this fine: http://api.checklistbank.org/parser/name?q=Sepidium%20capricorne%20des%20Desbrochers%20des%20Loges,%201881

mdoering commented 6 months ago

I am not sure where the problem actually originates from. The Name record thinks its a subspecies and uses des as an epithet: https://www.checklistbank.org/dataset/1206/name/1016

The verbatim record actually only has a genus, but rank=species and a scientificName=Sepidium capricorne Desbrochers des Loges, 1881: https://www.checklistbank.org/dataset/1206/verbatim/602

The missing specificEpithet results in an inconsistent name flag, as the atomised name and the name string do not match up.

mdoering commented 6 months ago

@yroskov @gdower the registered coldp file at https://github.com/gdower/data-sepidiini/archive/coldp.zip does not exist

mdoering commented 6 months ago

I have reimported the last archive that was uploaded and now its all fine. It was simply interpreted last time 1.5years ago with an older software stack

mdoering commented 6 months ago

https://api.checklistbank.org/dataset/1206/name/1016