Open yroskov opened 3 years ago
Limonia (Dicranomyia) phatta (Philippi, 1865) -> should be Philippi, 1865 Synonyms = Limnobia phatta Philippi, 1865 (acceptable) = Limnobia stictica Blanchard, 1852 = Furcomyia blanchardi Alexander, 1913
However, brackets are allready correct in cases, where Present_Genus ≠ Original_Genus & Valid_Species = Species & Valid_Sp_Author = Author
We are building names and relationships from the following fields:
acc: Present_Genus+Valid_Species+Valid_Sp_Author
syn: Original_Genus+Species+Author+Year
Tasks: apply year in accepted authorstring and decide where brackets should be added or not.
1) If Present_Genus = Original_Genus & Valid_Species = Species & Valid_Sp_Author = Author => take Author+Year, no brackets
2) If Present_Genus ≠ Original_Genus & Valid_Species = Species & Valid_Sp_Author = Author => take Author+Year, add brackets
3) If Present_Genus = Original_Genus & Valid_Species ≠ Species & Valid_Sp_Author = Author => no action.
There is an exception, where Valid_Species ≠ Species because of corrected gender ending. Example: acc Helius acanthostylus Alexander, syn = Helius acanthostyla Alexander, 1944 (I agree with result) https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Elephantomyiini-Helius-acanthostylus-f52de6675
4) If Present_Genus = Original_Genus & Valid_Species = Species & Valid_Sp_Author ≠ Author => just take Valid_Sp_Author, no year, no brackets curious case, but may appear
5) If Present_Genus ≠ Original_Genus & Valid_Species ≠ Species & Valid_Sp_Author ≠ Author => no action
6) If Present_Genus ≠ Original_Genus & Valid_Species = Species & Valid_Sp_Author ≠ Author => just take Valid_Sp_Author, no year, no brackets
Example: Helius abditus Krzemiński, 1985, syn = Helius abditus Krzemiński, 1985 https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Elephantomyiini-Helius-abditus-6eb7ccfd4
Example: Limonia (Dicranomyia) insignifica (Alexander, 1912) Published in Alexander, C.P. (1912). https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Limonia-Dicranomyia-insignifica-f52de6675
Diptera = Order Diptera - [Superfamily Not assigned] - Family Not assigned - [Subfamily Not assigned] - [Tribe Not assigned] - Genus Rhopaloscolex F = not assigned in a family (FAcalyptratae) T = not assigned in a tribe (TPorricondylinae) G = not assigned in a genus (!) (GAcalyptratae commoror) - i.e. taxonomicly unresolved, has no sense in CoL. SG = not assigned in a subgenus (Cephalops (*SGCephalops*) incohatus Morakote, 1990 and Chironomus (GChironomus**) harti Malloch, 1915) - we need to delete such subgenera (how?) These names are visible in Issues - Partially Parsable Names https://data.catalogueoflife.org/catalogue/3/dataset/2244/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&issue=partially%20parsable%20name&limit=100&offset=0&status=accepted
It caused by parasitic characters in genus name Spungisomyia (tribe *TPorricondylinae)
González - 48 sci names filtered in master file
Example: Agelanius burgeri González, 2006
Acc: Archiborborus (GArchiborborus) femoralis (Blanchard, 1852) Cephalops (SGCephalops) excellens (Kertész, 1912) etc. etc.
These names are visible in Issues - Partially Parsable Names https://data.catalogueoflife.org/catalogue/3/dataset/2244/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&issue=partially%20parsable%20name&limit=100&offset=0&status=accepted
NameID in the master file: 153660
Supposed to be: acc Systenus pallipes (Roser, 1840), syn Rhaphium pallipes Roser, 1840, Rhaphium adpropinquans Loew, 1857; fam Dolichopodidae
What's gone wrong? Could it be because of value with backslash in Type_locality: Poland. "Scharlottenbrunn in Schlesien"\
2020-12-21 test of the CLASSIFICATION diptera-flat-dev https://data.catalogueoflife.org/dataset/2244/classification
[ ] 115 more Diptera families in a new version than in ac18. !!! I am not able to select and copy&paste expanded families in the tree in a new portal. It makes comparison very difficult.
[x] Taxa outside Diptera in the tree (included as "families", status=99 means non-Diptera (24 names have status 99 in the SPECIES file)); will be excluded from import: FIXED ACARI Blattidae Homoptera: Aphididae HOMOPTERA: Tettigarctidae MECOPTERA ORTHOPTERA Plecoptera Trichoptera & TRICHOPTERA
[ ] 562 names have empty Valid_Species field in SPECIES file. Majority of these names have statuses 55 (nomen nudum) or 61 (Established for Hybrid). These names should be filtered out of CoL.
Blephariceridae (344 spp) Blepharoceridae (1 sp Edwardsina imperatrix Alexander, 1953)
Bomblyiidae (1 sp Pioneeria bombylia Grimaldi, 2016) Bombyliidae (4983 spp)
Callilphoridae (1 sp Pollenia mesopotamica Mawlood & Abdul-Rassoul, 2009) Calliphoridae (1392 spp)
Canacacidae (3 spp) Canacidae (327 spp)
Ceratoipogonidae (19 spp) Ceratopogonidae (6423 spp)
Cylidrotomidae (1 sp Cylindrotoma nigritarsis Alexander, 1956) Cylindrotomidae (88 spp)
Dolchopodidae (10 spp) Dolichopodidae (7805 spp)
Drosophilidae (4502 spp) Drospphilidae (1 sp Drosophila kikalaeleele Lapoint, Magnacca & O'Grady, 2009)
Helcomyzidae (13 spp) Heleomyzidae (760 spp)
Iteaphila group (12 spp)
Mycetophildae (2 spp: Dziedzickia laticornis (Enderlein, 1910) & Mycetophila macula Enderlein, 1910) Mycetophilidae (4842 spp)
Phoridae (4599 spp) Phortidae (1 sp: Phora sp. DELETE)
Ropalomeridae (31 spp) Ropalomridae (1 sp Ropalomera albifasciata Kirst & Ale-Rocha, 2012)
Sarcophagidae (3281 spp) Sarcophgidae (1 sp Paraphrissopoda catiae Lehrer, 2006)
Sphaeroceidae (1 sp Paralimosina curvata Su & Liu, 2017) Sphaeroceridae (1702 spp)
Stratiomyidaae (14 spp) Stratiomyidae (2917 spp)
Syringogastridae (11 spp) Syringogsatridae (13 spp)
Tachionidase (1 sp Phyllomya gibsonomyioides Crosskey, 1976)
Xylomyidae (147 spp) Xylo0myidae (1sp Xylomya wenxiana Yang, Gao & An, 2005) Xylomyiidae (1sp Cretoxyla azari Grimaldi & Cumming, 2011)
TESTS OF ver 2.10, 2021-01-12 Imported on 2021-01-14 https://data.catalogueoflife.org/dataset/2244/classification
2) Two identical entries: family Cecidomyiidae (6873 spp) family Cecidomyiidae (4 spp: Laxediplosis latebra Fedotova & Sidorenko, 2009; Magadiplosis mera Fedotova & Sidorenko, 2009; Marikovskidiplosis bullata Fedotova & Sidorenko, 2009; Ruidadiplosis fluida Fedotova & Sidorenko, 2009)
Is Ceidomyiidae (1 sp: Pekinomyia syringae Jiao & Kolesik, 2020) correct spelling?
3) family Chironomidae (7578 spp) family Chironomiddae (1 sp: Libanorthocladius furcatus Veltz, Azar & Nel, 2007)
4) family Diopsidae (200 spp) family Diopsiddae (4 spp: Gracilopsina sinespina Feijen & Feijen, 2017; Madagopsina freidbergi Feijen & Feijen, 2017; Madagopsina parvapollina Feijen & Feijen, 2017; Madagopsina tschirnausi Feijen & Feijen, 2017)
5) family Dolichopodidae (7879 spp) family Dolivchopodidae (5 spp: Dubius autumnalis Wei, 2012; Dubius curtus Wei, 2012; Dubius frontus Wei, 2012; Dubius hongyaensis Wei, 2012; Dubius succurtus Wei, 2012)
6) family Empididaae (5 spp: Anaclastoctedon ancistrodes Plant, 2010; Anaclastoctedon antarai Plant, 2010; Anaclastoctedon lek Plant, 2010; Anaclastoctedon prionoton Plant, 2010; Anaclastoctedon sano Plant, 2010) family Empididae (3358 spp)
7) family Helcomyzidae (13 spp) family Heleomyzidae (767 spp)
8) family Lonchaea (1 sp: Lonchaea albimanus Walker, 1858) family Lonchaeidae (518 spp)
9) family Lonchopteridae (68 spp) family Lonchopteroidea (2 spp: Alonchoptera lebanica Grimaldi, 2012; Lonchopterites burmensis Grimaldi, 2018)
10) family Limnoiidae (1 sp: Micrdacno petraensis Kaddumi, 2005) family Limoniidae (11159 spp)
11) family Mycdetophilidae (1 sp: Eoexechia gallica Camier & Nel, 2020) family Mycetophilidae (4875 spp)
12) family Nemestrinbidae (1 sp: Mesonemestrius caii Zhang, Zhang & Wang, 2017) family Nemestrinidae (309 spp)
13) family Perissomatidae (3 spp: Collessomma gnoma Lukashevich & Blagoderov, 2020; Collessomma mongolica Lukashevich & Blagoderov, 2020; Collessomma sibirica Lukashevich & Blagoderov, 2020) family Perissommatidae (11 spp)
14) family Phoridae (4622 spp) family Phoridae? (1 sp: Dubiaphis curvata Brauckmann & Schlüter, 1993)
15) family Pleciofungicoridae (1 sp: Liaoxifungivora simplicis Hong, 1992) family Pleciofungivoridae (71 spp)
16) family Ptychopteridae (156 spp) family Ptycvhopteridae (1 sp: Neuseptychoptera carolinensis Szadziewski, Krynicki & Krzemiński, 2017)
17) family Sacthophagidae (3 spp: Norellia leigongshana Wei & Yang, 2007; Norellia paraqiana Wei & Yang, 2007; Norellia qiana Wei & Yang, 2007) family Scathophagidae (448 spp)
18) family Syrphidae (6644 spp) family Syrphoidea (1 sp: Aschizomyia burmensis Grimaldi, 2018)
On Stardate 1/14/21, 1:06 PM, "Neal Evenhuis" neale@bishopmuseum.org wrote:
Many thanks Yury!
As to your queries:
Cheers,
Neal L. Evenhuis
Neal's comments to the ISSUES in ver V2.8 2020-11-13
From: Neal Evenhuis Sent: Monday, January 11, 2021 17:30 To: Ower, Geoffrey Donald; Thomas Pape; Richard Pyle Cc: Roskov, Yury Subject: Systema Dipterorum Dataset -- Issues corrections
Hi all,
A bit later today I’ll be sending to Rich the current files for SD that he will convert and send to Geoff as well as post to the SD website.
I’ve made a number of corrections using the issues list Geoff provided (many thanks again for these!) and have some general and specific notes on those I have looked at or dealt with in detail. Some explain anomalies with certain records; others seem to be a result of export glitches since it appears data from other fields somehow got populated in the wrong field; and some general remarks that could clear up many issues that may not be “issues”. I did my corrections from the bottom of the issues page scrolling up (i.e., low-hanging fruit with few issues, then those with more). My remarks follow:
Issues completed Unlikely year Parent name mismatch Name invalid Inconsistent authorship Unparsable year Accepted name missing Accepted ID invalid Classification not applied (which seems to be missing today!) Blacklist epithet Question marks removed Uppercase epithet Escaped characters Doubtful name Unmatched reference brackets Name ID invalid Unparsable authorship Subspecies assigned Multi-word epithet Reference ID invalid
General remarks Subspecies – we do not have a subspecies data field so this has been put into the “Valid species” field with the string as “[species] ssp. [subspecies]”. All of these have been searched for and made consistent. If it is allowed, this will clear out a number of records from “Unusual name Characters”, “Multi-word epithet”, and other issue categories. Name match none – there are a number of nomina nuda without a valid name to attach it to so the valid name is left blank. Name match none / missing genus – unplaced names are dealt with by using an asterisk and then a one- or two-letter prefix to denote category; thus, FNematocera means unplaced to family within Nematocera. This is a call for Thomas and I to make if we want to change all of these to instead read as “(Unplaced in …[taxon])” for all those cases where an asterisk is used in the Present Genus field. I have deleted all cases of this in the Present Subgenus field since leaving the field blank says the same thing [i.e., it is placed to genus, but there should be no indication of what subgenus until it is determined; so no need to say so with the “SG-“ prefix (= redundant for that field)]. If we do this for the genus field, the resulting string would be, e.g., “(Unplaced in Lestremiinae) globulifera Keilbach”. For the family field, no parentheses would be needed as it displays as indicated. Thoughts? Uninterpreted – the status field (a number) is what indicates the status of a record. The “Status” table (cheat-sheet of status definitions) should be referred to in order to help interpret any records that cannot be determined and may save a lot of time actually. A script was written in FM that imports the status definition into the “Status Line” field, which displays in the FM database and the web display but may be lost in the dataset you have. Thus, a Status field with “20” will have “(Available, Invalid) Junior Synonym” on the status line.
Specific remarks Uppercase epithet: col:ID SP153660, SP100471 – both records look fine in SD but look like they may have had an export problem as in your dataset, the fields for these records have data from other fields Doubtful name: col:ID F234 (Muscinae ariciaeformes) This is verbatim how the name appears. It seems to be the only multi-name epithet for family-group names Unmatched reference brackets: col:ID 27074 – this looks fine. There is a beginning and end bracket and within in beginning and end parentheses Unparsable authorship: col:ID SP186861 – looks OK. There are odd diacritics, but these are correct. There are records (SP187151, SP117359) with Vietnamese authors that got tagged as well and these have been checked and are fine. Unparsable authorship: many records – there are a number of records with the following string “Gonz_x0005_ález” (which equals González”). Not sure what happened there as other authors with diacritics come out fine but just poor González got this added stuff. Also SP10872 has authorship as “_x0010_Chillcott & Teskey”. Clues as to what went on there? Name ID invalid: these all seem to be nomina nuda with no valid name to attach to it Accepted name missing; accepted ID invalid: col:ID SP73180; SP23744 – both look fine in SD; cannot see what is wrong; maybe another export glitch? Name match variant: col:ID SP134850; SP145990 – both look OK, what is wrong? Name match variant: col:ID SPO175218 – gender ending change is all I can see. Anything else I’m missing?
Hopefully, this next batch you will get is a bit more cleaner and I look forward to the revised “Issues” page so I can continue working to clean up errors! If you have any questions or concerns, please let me know.
Thanks,
Neal
Logic for crawler
If:
Original Genus = Present Genus
&
if (3?) four (5?) first characters in Species = four first characters in Valid Species, example: Anthrax trifasciatus Meigen (acc) and Anthrax trifasciata Meigen, 1804 (syn)
&
Author = Valid Sp Author
=>
Year should be taken in accepted species authorstring.
Attention! There are 4 and 3-letter epithets (not many), for example, ater cana cara egae esau flui luna sana sica
dyi leo lui mus nii
NEW VERSION diptera-dev: Diptera Flat Dev of 2021-04-15 at https://data.dev.catalogueoflife.org/dataset/2131/classification
Imported 169385 spp
Classification: ranks present in order Diptera - families - subfamilies (optional) - tribes (optional) - genera - subgenera (optional) - species. None of the uninomials have an authorstring.
Good example from Geoff: https://data.dev.catalogueoflife.org/dataset/2131/taxon/Culicidae-Culicinae-Aedini-Aedes-Stegomyia-aegypti-4f837e18c
Aedes (Stegomyia) aegypti, its original combination = Culex aegypti Linnaeus, 1762 (acceptable), lot of synonyms (year present in authorstrings)
Year is missing in all accepted species authorstrings
FIXED
Dataset of April 20th 2021 at https://data.dev.catalogueoflife.org/dataset/2131/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&sortBy=taxonomic
Few sample names: Dolichophaonia anoctiluca (Carvalho, 1983) Dolichophaonia cacheuta (Snyder, 1957) Dolichopus angulicornis (Grichanov, 2004) Molophilus genitalis (Brunetti, 1912) Platypalpus ostiorum (Becker, 1902) Suillia aspinosa (Lamb, 1917) Trupanea wheeleri (Curran, 1932)
Diptera Flat Dev, id 2131 https://data.dev.catalogueoflife.org/catalogue/3/assembly?datasetKey=2131
Assembly in the Tree at Dev (experiment 1):
Results - 9 families failed to be established as sectors, set of families were marked as broken sectors in CoL tree:
9 families failed to be established as a sector: Blephariceridae Curtonotidae Cypselosomatidae Eomyiidae Hennigmatidae Pachyneuridae Phaeomyiidae Scatopsidae and also:
Plus broken sectors in CoL:
Sector mapping changes with each browser refresh.
Experiment failed. All sectors deleted. Order DipteraDev deleted.
Assembly in the Tree at Dev (experiment 2):
Results: order Diptera with all children families established as a single sector successfully.
Synced 2021-04-26.
Imported: 169386 spp; acc 184976, syns 110015
ISSUES (selected)
[x] Unparsable Authorship, 90. @gdower, one case should be fixed: name Gonz_x0005_ález appears like that in many species names. https://data.dev.catalogueoflife.org/catalogue/3/dataset/2131/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=unparsable%20authorship&limit=100&offset=0
[x] Missing Genus, 3520 - expected.
[ ] Unusual Name Characters, 3787 & Partially Parsable Name - ask SD, what to do with such names: https://data.dev.catalogueoflife.org/catalogue/3/dataset/2131/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=unusual%20name%20characters&limit=100&offset=0
[ ] Id Not Unique, 20420 - @gdower, how serious it might be?
TASKS 2021-04-26
[x] ACC-ACC sp (diff auth) - all names with missing genus = to be blocked all https://data.dev.catalogueoflife.org/catalogue/3/dataset/2131/duplicates?authorshipDifferent=true&category=binomial&limit=50&minSize=2&mode=STRICT&offset=0&status=accepted&withDecision=false
[ ] Identical genus. Split genera. Set of species attached to the next parent (unplaced)
[ ] Any uninomial @yroskov, See the case of split Tribe: Acemyini
POSSIBLE FAILURE:
Was name Domomyza sp. 1959 present in any Issue reports? (seen in the Tree)
2021-04-29. Geoff's import of 2021-04-28.
Most challenging Task is Identical Genus:
diptera-dev2, id 2131 https://data.dev.catalogueoflife.org/dataset/2134/issues
Imported: 169385 spp. (dev1 169385), ; acc 184976 (dev184976), syn 109536 (dev1 110015)
ISSUES (selected)
also for YR:
diptera-flat-dev at https://data.catalogueoflife.org/dataset/2244
Data in the Clearinghouse checked against master table SPECIES.xlsx from v2.8, rcvd 2020-11-13
https://data.catalogueoflife.org/catalogue/3/dataset/2244/tasks
https://data.catalogueoflife.org/catalogue/3/dataset/2244/issues