CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

SD via TW: test report #248

Open yroskov opened 4 months ago

yroskov commented 4 months ago

Systema Dipterorum ver. 5.0, 2024-01-08 processed via TW by DD; imported 2024-02-07

Summary of issues for @proceps extracted from the editorial report https://github.com/CatalogueOfLife/testing/issues/127#issuecomment-1934409065

yroskov commented 4 months ago
yroskov commented 4 months ago

There are two reports with lists of duplicated species:

ACC-ACC species (same authors), 342: https://www.checklistbank.org/dataset/1101/duplicates?authorshipDifferent=false&category=binomial&limit=50&minSize=2&mode=STRICT&offset=0&status=accepted

There are 342 pairs of identical accepted species. For example:

Agadasys hexablepharis Whittington, 2000 Amblypsilopus qinlingensis Yang & Saigusa, 2005 Amplisegmentum venezuelensis Winterton, 2021 etc.

ACC-ACC species (different authors), 512: https://www.checklistbank.org/dataset/1101/duplicates?authorshipDifferent=true&category=binomial&limit=50&minSize=2&mode=STRICT&offset=0&status=accepted

There are many pairs of identical species which differ by bracketed and unbracketed authorstrings. For example:

Hoplacephala excisa (Villeneuve, 1913) vs Hoplacephala excisa Villeneuve, 1913 _see http://www.diptera.org/Nomenclator?op_name=&Name=Hoplacephala+excisa&op_author=&Author=&op_year=&Year=&op_family=&Family=&op_validname=&ValidName=&kind=&Sortfield=unsorted&sortorder=ascending&max=10&find=Start+Search_

Hoplacephala nigriventris (Villeneuve, 1913) vs Hoplacephala nigriventris Villeneuve, 1913 _see http://www.diptera.org/Nomenclator?op_name=&Name=Hoplacephala+nigriventris&op_author=&Author=&op_year=&Year=&op_family=&Family=&op_validname=&ValidName=&kind=&Sortfield=unsorted&sortorder=ascending&max=10&find=Start+Search_

Hoplacephala retroseta (Villeneuve, 1913) vs Hoplacephala retroseta Villeneuve, 1913 _see http://www.diptera.org/Nomenclator?op_name=&Name=Hoplacephala+retroseta&op_author=&Author=&op_year=&Year=&op_family=&Family=&op_validname=&ValidName=&kind=&Sortfield=unsorted&sortorder=ascending&max=10&find=Start+Search_

Huttonobesseria verecunda (Hutton, 1901) vs Huttonobesseria verecunda Hutton, 1901

Hystricia cuestae (Engel, 1920) vs Hystricia cuestae Engel, 1920

Isomyia pseudolucilia (Malloch, 1928) vs Isomyia pseudolucilia Malloch, 1928 etc.

Plus, there are 10 pairs of identical accepted species in this report. Full list (differently spelled authors!):

Empis (Polyblepharis) fedtschenkoi Shasmshev, 2023 = Shamshev vs Shasmshev
Empis (Polyblepharis) hirsutitarsis Shamshev, 2023 Empis (Polyblepharis) sogdiensis Shamshev, 2023 Empis (Polyblepharis) sogdiensis Shasmshev, 2023

Holops anarayae Barahona-Segovia, 2021 = Baharona-Segovia vs Barahona-Segovia Holops grezi Barahona-Segovia, 2021 Holops pullomen Baharona-Segovia, 2021 Physoconops tentenvilu Baharona-Segovia, 2020

Paraclius brooksi Soares, Capellari & Ale-Rocha, 2023 = Soares, Capellari & Ale-Rocha, 2023: 176 vs Soares, Runyon & Capellari, 2023: 166

Polleniopsis bomdilaensis Bharti & Verves, 2016 = Bharti & Verves, 2015: 1 vs Bharti & Verves, 2016: 1

yroskov commented 4 months ago

https://www.checklistbank.org/dataset/1101/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=unparsable%20authorship&limit=50&offset=0

yroskov commented 4 months ago

@gdower, there is an idea why this happened. TW exported a name as a string Sargus infuscatus var. [sic] minor (i.e. with added portion [sic]). Parser recognize it as a quadrinomial and CLB, probably, cut off third epithet.

yroskov commented 4 months ago

https://www.checklistbank.org/dataset/1101/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=inconsistent%20name&limit=50&offset=0

@gdower, in the case of "var. var. names" probably is also a cut of quadrinomial: if TW exported a name as a string Aedes variegatus var. var. hebrideus (i.e. with added portion var.). Somewhere (in parser or in CLB) the quadrinomial is shortened.

yroskov commented 4 months ago

https://www.checklistbank.org/dataset/1101/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=rank%20name%20suffix%20conflict&limit=50&offset=0

yroskov commented 4 months ago
proceps commented 4 months ago

Unparsable authorship, those probably have strange characters in the original DB: Megaselia neocorynurae González, Brown & Ospina, 2002

proceps commented 4 months ago

Unusual Name Characters, 28: incomplete trinomial names like Sargus infuscatus var. -- This looks like the parser error. The name is complete in TW.

proceps commented 4 months ago

Inconsistent Name, 840: incomplete trinomial names like Aedes subsp. holocinctus Edwards, 1941 --- Incomplete data in the original DB.