CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

Systema Dipterorum: **diptera-flat-dev** test report #6

Open yroskov opened 3 years ago

yroskov commented 3 years ago

diptera-flat-dev at https://data.catalogueoflife.org/dataset/2244

Data in the Clearinghouse checked against master table SPECIES.xlsx from v2.8, rcvd 2020-11-13

https://data.catalogueoflife.org/catalogue/3/dataset/2244/tasks

https://data.catalogueoflife.org/catalogue/3/dataset/2244/issues

yroskov commented 3 years ago

Example: https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Limonia-Dicranomyia-phatta-7ab9fc7c3

Limonia (Dicranomyia) phatta (Philippi, 1865) -> should be Philippi, 1865 Synonyms = Limnobia phatta Philippi, 1865 (acceptable) = Limnobia stictica Blanchard, 1852 = Furcomyia blanchardi Alexander, 1913

However, brackets are allready correct in cases, where Present_Genus ≠ Original_Genus & Valid_Species = Species & Valid_Sp_Author = Author

We are building names and relationships from the following fields: acc: Present_Genus+Valid_Species+Valid_Sp_Author
syn: Original_Genus+Species+Author+Year

Tasks: apply year in accepted authorstring and decide where brackets should be added or not.

1) If Present_Genus = Original_Genus & Valid_Species = Species & Valid_Sp_Author = Author => take Author+Year, no brackets

2) If Present_Genus ≠ Original_Genus & Valid_Species = Species & Valid_Sp_Author = Author => take Author+Year, add brackets

3) If Present_Genus = Original_Genus & Valid_Species ≠ Species & Valid_Sp_Author = Author => no action.

There is an exception, where Valid_Species ≠ Species because of corrected gender ending. Example: acc Helius acanthostylus Alexander, syn = Helius acanthostyla Alexander, 1944 (I agree with result) https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Elephantomyiini-Helius-acanthostylus-f52de6675

4) If Present_Genus = Original_Genus & Valid_Species = Species & Valid_Sp_Author ≠ Author => just take Valid_Sp_Author, no year, no brackets curious case, but may appear

5) If Present_Genus ≠ Original_Genus & Valid_Species ≠ Species & Valid_Sp_Author ≠ Author => no action

6) If Present_Genus ≠ Original_Genus & Valid_Species = Species & Valid_Sp_Author ≠ Author => just take Valid_Sp_Author, no year, no brackets

yroskov commented 3 years ago

Example: Helius abditus Krzemiński, 1985, syn = Helius abditus Krzemiński, 1985 https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Elephantomyiini-Helius-abditus-6eb7ccfd4

yroskov commented 3 years ago

Example: Limonia (Dicranomyia) insignifica (Alexander, 1912) Published in Alexander, C.P. (1912). https://data.catalogueoflife.org/dataset/2244/taxon/Limoniidae-Limoniinae-Limonia-Dicranomyia-insignifica-f52de6675

yroskov commented 3 years ago

Diptera = Order Diptera - [Superfamily Not assigned] - Family Not assigned - [Subfamily Not assigned] - [Tribe Not assigned] - Genus Rhopaloscolex F = not assigned in a family (FAcalyptratae) T = not assigned in a tribe (TPorricondylinae) G = not assigned in a genus (!) (GAcalyptratae commoror) - i.e. taxonomicly unresolved, has no sense in CoL. SG = not assigned in a subgenus (Cephalops (*SGCephalops*) incohatus Morakote, 1990 and Chironomus (GChironomus**) harti Malloch, 1915) - we need to delete such subgenera (how?) These names are visible in Issues - Partially Parsable Names https://data.catalogueoflife.org/catalogue/3/dataset/2244/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&issue=partially%20parsable%20name&limit=100&offset=0&status=accepted

image

image

yroskov commented 3 years ago

image

It caused by parasitic characters in genus name Spungisomyia (tribe *TPorricondylinae)

image

yroskov commented 3 years ago

González - 48 sci names filtered in master file

Example: Agelanius burgeri González, 2006 image

yroskov commented 3 years ago

Acc: Archiborborus (GArchiborborus) femoralis (Blanchard, 1852) Cephalops (SGCephalops) excellens (Kertész, 1912) etc. etc.

These names are visible in Issues - Partially Parsable Names https://data.catalogueoflife.org/catalogue/3/dataset/2244/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&issue=partially%20parsable%20name&limit=100&offset=0&status=accepted

yroskov commented 3 years ago

NameID in the master file: 153660

Supposed to be: acc Systenus pallipes (Roser, 1840), syn Rhaphium pallipes Roser, 1840, Rhaphium adpropinquans Loew, 1857; fam Dolichopodidae

What's gone wrong? Could it be because of value with backslash in Type_locality: Poland. "Scharlottenbrunn in Schlesien"\

yroskov commented 3 years ago

2020-12-21 test of the CLASSIFICATION diptera-flat-dev https://data.catalogueoflife.org/dataset/2244/classification

yroskov commented 3 years ago

Blephariceridae (344 spp) Blepharoceridae (1 sp Edwardsina imperatrix Alexander, 1953)

Bomblyiidae (1 sp Pioneeria bombylia Grimaldi, 2016) Bombyliidae (4983 spp)

Callilphoridae (1 sp Pollenia mesopotamica Mawlood & Abdul-Rassoul, 2009) Calliphoridae (1392 spp)

Canacacidae (3 spp) Canacidae (327 spp)

Ceratoipogonidae (19 spp) Ceratopogonidae (6423 spp)

Cylidrotomidae (1 sp Cylindrotoma nigritarsis Alexander, 1956) Cylindrotomidae (88 spp)

Dolchopodidae (10 spp) Dolichopodidae (7805 spp)

Drosophilidae (4502 spp) Drospphilidae (1 sp Drosophila kikalaeleele Lapoint, Magnacca & O'Grady, 2009)

Helcomyzidae (13 spp) Heleomyzidae (760 spp)

Iteaphila group (12 spp)

Mycetophildae (2 spp: Dziedzickia laticornis (Enderlein, 1910) & Mycetophila macula Enderlein, 1910) Mycetophilidae (4842 spp)

Phoridae (4599 spp) Phortidae (1 sp: Phora sp. DELETE)

Ropalomeridae (31 spp) Ropalomridae (1 sp Ropalomera albifasciata Kirst & Ale-Rocha, 2012)

Sarcophagidae (3281 spp) Sarcophgidae (1 sp Paraphrissopoda catiae Lehrer, 2006)

Sphaeroceidae (1 sp Paralimosina curvata Su & Liu, 2017) Sphaeroceridae (1702 spp)

Stratiomyidaae (14 spp) Stratiomyidae (2917 spp)

Syringogastridae (11 spp) Syringogsatridae (13 spp)

Tachionidase (1 sp Phyllomya gibsonomyioides Crosskey, 1976)

Xylomyidae (147 spp) Xylo0myidae (1sp Xylomya wenxiana Yang, Gao & An, 2005) Xylomyiidae (1sp Cretoxyla azari Grimaldi & Cumming, 2011)

yroskov commented 3 years ago

TESTS OF ver 2.10, 2021-01-12 Imported on 2021-01-14 https://data.catalogueoflife.org/dataset/2244/classification

yroskov commented 3 years ago

2) Two identical entries: family Cecidomyiidae (6873 spp) family Cecidomyiidae (4 spp: Laxediplosis latebra Fedotova & Sidorenko, 2009; Magadiplosis mera Fedotova & Sidorenko, 2009; Marikovskidiplosis bullata Fedotova & Sidorenko, 2009; Ruidadiplosis fluida Fedotova & Sidorenko, 2009)

Is Ceidomyiidae (1 sp: Pekinomyia syringae Jiao & Kolesik, 2020) correct spelling?

image

3) family Chironomidae (7578 spp) family Chironomiddae (1 sp: Libanorthocladius furcatus Veltz, Azar & Nel, 2007)

4) family Diopsidae (200 spp) family Diopsiddae (4 spp: Gracilopsina sinespina Feijen & Feijen, 2017; Madagopsina freidbergi Feijen & Feijen, 2017; Madagopsina parvapollina Feijen & Feijen, 2017; Madagopsina tschirnausi Feijen & Feijen, 2017)

5) family Dolichopodidae (7879 spp) family Dolivchopodidae (5 spp: Dubius autumnalis Wei, 2012; Dubius curtus Wei, 2012; Dubius frontus Wei, 2012; Dubius hongyaensis Wei, 2012; Dubius succurtus Wei, 2012)

6) family Empididaae (5 spp: Anaclastoctedon ancistrodes Plant, 2010; Anaclastoctedon antarai Plant, 2010; Anaclastoctedon lek Plant, 2010; Anaclastoctedon prionoton Plant, 2010; Anaclastoctedon sano Plant, 2010) family Empididae (3358 spp)

7) family Helcomyzidae (13 spp) family Heleomyzidae (767 spp)

8) family Lonchaea (1 sp: Lonchaea albimanus Walker, 1858) family Lonchaeidae (518 spp)

9) family Lonchopteridae (68 spp) family Lonchopteroidea (2 spp: Alonchoptera lebanica Grimaldi, 2012; Lonchopterites burmensis Grimaldi, 2018)

10) family Limnoiidae (1 sp: Micrdacno petraensis Kaddumi, 2005) family Limoniidae (11159 spp)

11) family Mycdetophilidae (1 sp: Eoexechia gallica Camier & Nel, 2020) family Mycetophilidae (4875 spp)

12) family Nemestrinbidae (1 sp: Mesonemestrius caii Zhang, Zhang & Wang, 2017) family Nemestrinidae (309 spp)

13) family Perissomatidae (3 spp: Collessomma gnoma Lukashevich & Blagoderov, 2020; Collessomma mongolica Lukashevich & Blagoderov, 2020; Collessomma sibirica Lukashevich & Blagoderov, 2020) family Perissommatidae (11 spp)

14) family Phoridae (4622 spp) family Phoridae? (1 sp: Dubiaphis curvata Brauckmann & Schlüter, 1993)

15) family Pleciofungicoridae (1 sp: Liaoxifungivora simplicis Hong, 1992) family Pleciofungivoridae (71 spp)

16) family Ptychopteridae (156 spp) family Ptycvhopteridae (1 sp: Neuseptychoptera carolinensis Szadziewski, Krynicki & Krzemiński, 2017)

17) family Sacthophagidae (3 spp: Norellia leigongshana Wei & Yang, 2007; Norellia paraqiana Wei & Yang, 2007; Norellia qiana Wei & Yang, 2007) family Scathophagidae (448 spp)

18) family Syrphidae (6644 spp) family Syrphoidea (1 sp: Aschizomyia burmensis Grimaldi, 2018)

On Stardate 1/14/21, 1:06 PM, "Neal Evenhuis" neale@bishopmuseum.org wrote:

Many thanks Yury!

As to your queries:

  1. I’ve removed the question marks and added a note that “placement is uncertain; questionably placed in Trichomyiinae of Brachystomatidae”
  2. All listed genera should be in Cecidomyiidae; Cecidomyiidae is correct spelling; Ceidomyiidae and Ceciodmyiidae are errors that should be corrected
  3. Correct to Chironomidae
  4. Correct to Diopsidae
  5. Correct to Dolichopodidae
  6. Correct to Empididae
  7. Same q uery as for last batch, which I answered. Helcomyzidae is a separate family from Heleomyzidae. Both good families
  8. “Lonchaea” Should be family Lonchaeidae
  9. OK as is. Both are considered unplaced to family within Lonchopteroidea
  10. Correct to Limoniidae
  11. Correct to Mycetophilidae
  12. Correct to Nemestrinidae
  13. Correct to Perissommatidae
  14. I’ve removed question mark and added note that it is questionably placed in Phoridae
  15. Correct to Pleciofungivoridae
  16. Correct to Ptychopteridae
  17. Correct to Scathophagidae
  18. OK as is; current unplaced to family within Syrphoidea

Cheers,

Neal L. Evenhuis

yroskov commented 3 years ago
yroskov commented 3 years ago

Neal's comments to the ISSUES in ver V2.8 2020-11-13

From: Neal Evenhuis Sent: Monday, January 11, 2021 17:30 To: Ower, Geoffrey Donald; Thomas Pape; Richard Pyle Cc: Roskov, Yury Subject: Systema Dipterorum Dataset -- Issues corrections

Hi all,

A bit later today I’ll be sending to Rich the current files for SD that he will convert and send to Geoff as well as post to the SD website.

I’ve made a number of corrections using the issues list Geoff provided (many thanks again for these!) and have some general and specific notes on those I have looked at or dealt with in detail. Some explain anomalies with certain records; others seem to be a result of export glitches since it appears data from other fields somehow got populated in the wrong field; and some general remarks that could clear up many issues that may not be “issues”. I did my corrections from the bottom of the issues page scrolling up (i.e., low-hanging fruit with few issues, then those with more). My remarks follow:

Issues completed Unlikely year Parent name mismatch Name invalid Inconsistent authorship Unparsable year Accepted name missing Accepted ID invalid Classification not applied (which seems to be missing today!) Blacklist epithet Question marks removed Uppercase epithet Escaped characters Doubtful name Unmatched reference brackets Name ID invalid Unparsable authorship Subspecies assigned Multi-word epithet Reference ID invalid

General remarks Subspecies – we do not have a subspecies data field so this has been put into the “Valid species” field with the string as “[species] ssp. [subspecies]”. All of these have been searched for and made consistent. If it is allowed, this will clear out a number of records from “Unusual name Characters”, “Multi-word epithet”, and other issue categories. Name match none – there are a number of nomina nuda without a valid name to attach it to so the valid name is left blank. Name match none / missing genus – unplaced names are dealt with by using an asterisk and then a one- or two-letter prefix to denote category; thus, FNematocera means unplaced to family within Nematocera. This is a call for Thomas and I to make if we want to change all of these to instead read as “(Unplaced in …[taxon])” for all those cases where an asterisk is used in the Present Genus field. I have deleted all cases of this in the Present Subgenus field since leaving the field blank says the same thing [i.e., it is placed to genus, but there should be no indication of what subgenus until it is determined; so no need to say so with the “SG-“ prefix (= redundant for that field)]. If we do this for the genus field, the resulting string would be, e.g., “(Unplaced in Lestremiinae) globulifera Keilbach”. For the family field, no parentheses would be needed as it displays as indicated. Thoughts? Uninterpreted – the status field (a number) is what indicates the status of a record. The “Status” table (cheat-sheet of status definitions) should be referred to in order to help interpret any records that cannot be determined and may save a lot of time actually. A script was written in FM that imports the status definition into the “Status Line” field, which displays in the FM database and the web display but may be lost in the dataset you have. Thus, a Status field with “20” will have “(Available, Invalid) Junior Synonym” on the status line.

Specific remarks Uppercase epithet: col:ID SP153660, SP100471 – both records look fine in SD but look like they may have had an export problem as in your dataset, the fields for these records have data from other fields Doubtful name: col:ID F234 (Muscinae ariciaeformes) This is verbatim how the name appears. It seems to be the only multi-name epithet for family-group names Unmatched reference brackets: col:ID 27074 – this looks fine. There is a beginning and end bracket and within in beginning and end parentheses Unparsable authorship: col:ID SP186861 – looks OK. There are odd diacritics, but these are correct. There are records (SP187151, SP117359) with Vietnamese authors that got tagged as well and these have been checked and are fine. Unparsable authorship: many records – there are a number of records with the following string “Gonz_x0005_ález” (which equals González”). Not sure what happened there as other authors with diacritics come out fine but just poor González got this added stuff. Also SP10872 has authorship as “_x0010_Chillcott & Teskey”. Clues as to what went on there? Name ID invalid: these all seem to be nomina nuda with no valid name to attach to it Accepted name missing; accepted ID invalid: col:ID SP73180; SP23744 – both look fine in SD; cannot see what is wrong; maybe another export glitch? Name match variant: col:ID SP134850; SP145990 – both look OK, what is wrong? Name match variant: col:ID SPO175218 – gender ending change is all I can see. Anything else I’m missing?

Hopefully, this next batch you will get is a bit more cleaner and I look forward to the revised “Issues” page so I can continue working to clean up errors! If you have any questions or concerns, please let me know.

Thanks,

Neal

yroskov commented 3 years ago

Logic for crawler If: Original Genus = Present Genus & if (3?) four (5?) first characters in Species = four first characters in Valid Species, example: Anthrax trifasciatus Meigen (acc) and Anthrax trifasciata Meigen, 1804 (syn) & Author = Valid Sp Author
=> Year should be taken in accepted species authorstring.

Attention! There are 4 and 3-letter epithets (not many), for example, ater cana cara egae esau flui luna sana sica

dyi leo lui mus nii

yroskov commented 3 years ago

NEW VERSION diptera-dev: Diptera Flat Dev of 2021-04-15 at https://data.dev.catalogueoflife.org/dataset/2131/classification

Imported 169385 spp

Classification: ranks present in order Diptera - families - subfamilies (optional) - tribes (optional) - genera - subgenera (optional) - species. None of the uninomials have an authorstring.

yroskov commented 3 years ago

image

image

yroskov commented 3 years ago

Good example from Geoff: https://data.dev.catalogueoflife.org/dataset/2131/taxon/Culicidae-Culicinae-Aedini-Aedes-Stegomyia-aegypti-4f837e18c

Aedes (Stegomyia) aegypti, its original combination = Culex aegypti Linnaeus, 1762 (acceptable), lot of synonyms (year present in authorstrings)

image

yroskov commented 3 years ago

Year is missing in all accepted species authorstrings

FIXED

Dataset of April 20th 2021 at https://data.dev.catalogueoflife.org/dataset/2131/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&sortBy=taxonomic

Few sample names: Dolichophaonia anoctiluca (Carvalho, 1983) Dolichophaonia cacheuta (Snyder, 1957) Dolichopus angulicornis (Grichanov, 2004) Molophilus genitalis (Brunetti, 1912) Platypalpus ostiorum (Becker, 1902) Suillia aspinosa (Lamb, 1917) Trupanea wheeleri (Curran, 1932)

yroskov commented 3 years ago

Diptera Flat Dev, id 2131 https://data.dev.catalogueoflife.org/catalogue/3/assembly?datasetKey=2131

Assembly in the Tree at Dev (experiment 1):

  1. Created order DipteraDev
  2. Drag&Dropped all families from diptera-dev (2131) as children of order DipteraDev

Results - 9 families failed to be established as sectors, set of families were marked as broken sectors in CoL tree:

image

9 families failed to be established as a sector: Blephariceridae Curtonotidae Cypselosomatidae Eomyiidae Hennigmatidae Pachyneuridae Phaeomyiidae Scatopsidae and also: image

Plus broken sectors in CoL:

image

Sector mapping changes with each browser refresh.

Experiment failed. All sectors deleted. Order DipteraDev deleted.

yroskov commented 3 years ago

Assembly in the Tree at Dev (experiment 2):

  1. Renamed order Diptera as Dipteraold
  2. Drag&Dropped order Diptera from diptera-dev (2131) as a child of class Isecta

Results: order Diptera with all children families established as a single sector successfully.

image

Synced 2021-04-26.

yroskov commented 3 years ago

Imported: 169386 spp; acc 184976, syns 110015

ISSUES (selected)

https://data.dev.catalogueoflife.org/catalogue/3/dataset/2131/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=multi%20word%20epithet&limit=100&offset=0

yroskov commented 3 years ago

TASKS 2021-04-26

image

yroskov commented 3 years ago

POSSIBLE FAILURE:

Was name Domomyza sp. 1959 present in any Issue reports? (seen in the Tree) image

yroskov commented 3 years ago

2021-04-29. Geoff's import of 2021-04-28.

Most challenging Task is Identical Genus:

image

yroskov commented 3 years ago

diptera-dev2, id 2131 https://data.dev.catalogueoflife.org/dataset/2134/issues

Imported: 169385 spp. (dev1 169385), ; acc 184976 (dev184976), syn 109536 (dev1 110015)

ISSUES (selected)

also for YR: