Open yroskov opened 3 years ago
ISSUES (selected only) 2021-03-12
[x] Subspecies Assigned, 3 All synonyms; names like that: Dromia (Cryptodromia) de manii Alcock, 1900; Para-Lio-thelphusa mainitensis Balss, 1937; Para-Peri-thelphusa sucki Balss, 1937 name status attributed as "species aggregate". Changed to "species" via editorial decisions.
[x] Unparsable Authorship, 745. Names with "complex" authorstring, like that De Haan, 1833 [in De Haan, 1833-1850]. No problem.
TASKS 2021-03-12
xT | accepted | Dromioidea | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata | |||
---|---|---|---|---|---|---|---|
urn:lsid:marinespecies.org:taxname:106690 | accepted | Dromioidea | De Haan, 1833 [in De Haan, 1833-1850] | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata>Dromioidea>Dynomenidae>Acanthodromiinae>Acanthodromia>Podotremata | ||
urn:lsid:marinespecies.org:taxname:106700 | accepted | Majoidea | Samouelle, 1819 | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata>Pinnotheroidea>Pinnotheridae>Pinnotherinae>Abyssotheres>Eubrachyura>Heterotremata | ||
x3X | accepted | Majoidea | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata | |||
x45 | accepted | Ocypodoidea | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata | |||
urn:lsid:marinespecies.org:taxname:106707 | accepted | Ocypodoidea | Rafinesque, 1815 | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata>Pinnotheroidea>Pinnotheridae>Pinnotherinae>Abyssotheres>Eubrachyura>Thoracotremata | ||
urn:lsid:marinespecies.org:taxname:106708 | accepted | Pinnotheroidea | De Haan, 1833 [in De Haan, 1833-1850] | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata>Pinnotheroidea>Pinnotheridae>Pinnotherinae>Abyssotheres>Eubrachyura>Thoracotremata | ||
xK | accepted | Pinnotheroidea | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata | |||
urn:lsid:marinespecies.org:taxname:439089 | accepted | Pseudothelphusoidea | Ortmann, 1893 | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata>Pinnotheroidea>Pinnotheridae>Pinnotherinae>Abyssotheres>Eubrachyura>Heterotremata | ||
x3D | accepted | Pseudothelphusoidea | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata | |||
x34 | accepted | Xanthoidea | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata | |||
urn:lsid:marinespecies.org:taxname:106703 | accepted | Xanthoidea | MacLeay, 1838 | superfamily | Animalia>Arthropoda>Crustacea>Malacostraca>Eumalacostraca>Decapoda>Pleocyemata>Pinnotheroidea>Pinnotheridae>Pinnotherinae>Abyssotheres>Eubrachyura>Heterotremata |
[ ] Identical family, 6. Family repeated twice, second time without authorstring. UNRESOLVED https://data.catalogueoflife.org/catalogue/3/dataset/1108/duplicates?category=uninomial&limit=500&minSize=2&mode=STRICT&offset=0&rank=family&withDecision=false
[ ] Identical genus, 5. Genus repeated twice, second time without authorstring. UNRESOLVED https://data.catalogueoflife.org/catalogue/3/dataset/1108/duplicates?category=uninomial&limit=500&minSize=2&mode=STRICT&offset=0&rank=genus&withDecision=false
With resolved tasks:
Not synced: sector(s) not established. FIXED
Synced 2021-04-02
Broken hierarchy: https://github.com/CatalogueOfLife/testing/issues/141
2021-07-01: temporarily fixed by @gdower for July edition only
New classification:
Sectors: two sections Eubrachyura & Podotremata Establishing new sectors... Was: Deleted sector in suborder Pleocyemata. Deleted 2 subtrees in superfam Cryptochiroidea & Cyclodorippoidea (children of infraorder NotAssigned in suborder Pleocyemata) Set up infraorder Brachyura in suborder Pleocyemata. Drag&dropped two sections Eubrachyura & Podotremata in infraorder Brachyura. Synced 2021-07-01
ver 2021-11-01
TASKS - no changes
ver 2022-08-01
TASKS
Resolved:
Re-synced 2022-08-03
Dear @bart-v, GlobalNames developers pointed to the problem with presentation of multiple references in one(?) of WoRMS records in the CoL: https://www.catalogueoflife.org/data/taxon/96NL
Broken delimiters in references? Could you please have a look from your side?
@gdower also pointed: record_id | 7QGWB
These IDs also have that issue:
-[ RECORD 1 ]----- record_id | 96NH length | 236307 -[ RECORD 2 ]----- record_id | 96NJ length | 236307 -[ RECORD 3 ]----- record_id | 96MZ length | 236307 -[ RECORD 4 ]----- record_id | 96N5 length | 236307 -[ RECORD 5 ]----- record_id | 96N2 length | 236307 -[ RECORD 6 ]----- record_id | 96N8 length | 236307 -[ RECORD 7 ]----- record_id | 96NL length | 236307
We don't use double quotes as delimiters in our export. still COL tries to use them. This reference has a starting double quote, but not a closing one https://www.marinespecies.org/aphia.php?p=sourcedetails&id=261114
Title
"Molecular phylogeny of the genus Cronius Stimpson, 1860, with reassignment of C. tumidulus and several American species of Portunus to the genus Achelous De Haan, 1833 (Brachyura: Portunidae).
Removing it will fix this
Thanks, @bart-v!
@mdoering, usage of double quotes as delimiters in CLB/CoL - is it a good idea?
ColDP has defined that data files should be either TAB delimted without quoting or CSV with optional quoting as per RFC 4180 which is the official CSV specification. Contrary to dwc archives there is no meta file that can individually define other delimiters or quotes. If the CSV format is used, RFC4180 should be followed which says:
Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields. For example:
"aaa","bbb","ccc" CRLF zzz,yyy,xxx
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:
"aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
That means if a value starts with an unescaped quote then it is taken as the start of the optional quote.
If WoRMS never wants to use quotes it must take care that
TSV
might be a simpler format to use - usually you can avoid tabs and carriage returns within data entirely by just replacing them on the fly with a simple space. Then there is no need to escape or quote anything else.
https://github.com/CatalogueOfLife/coldp/blob/master/README.md#data-files
WoRMS is actually TSV (=TAB delimted without quoting) Still COL attempts to parse the double quote
Anyway, double quote removed from this reference, so should all be OK
Oh, quote with TAB are odd. I will look into this then on our side, thank for the hint!
There was indeed a bug that caused quotes to be used for TAB files. I have fixed this now.
While working on the quoting issue I found that the Reference.txt file using wrong columns (year, source + details are from ACEF times):
ID citation author title year source details doi link remarks
Here are the accepted ones which are more atomised, so the lumped bit in "details" find different homes: https://github.com/CatalogueOfLife/coldp/blob/master/README.md#reference
source
is being mapped automatically to containerTitle
and year
to issued
.
The only really troublesome field is the details
one for which there is no match.
From what I can see in the WoRMS help the reference data should map nicely: https://www.marinespecies.org/aphia.php?p=manual#topic5
DOI = col:doi author = col: author year = col:issued title = col:title journal = col:containerTitle suffix = col:page (suffix is actually a mix of col:volume, issue, edition & page)
Alternatively ColDP accepts also BibTex natively. You seem to support that already:: https://www.marinespecies.org/aphia.php?p=manual#topic40
But checking the problematic reference from above as BibTex it still has a suboptimal journal value so there is no gain over TSV really:
journal = {In: Crustacean Issues 18: Decapod Crustacean Phylogenetics, Martin, J.W., Crandall, K.A. & Felder, D.L. (eds)},
OK, in WoRMS, suffix
is not atomized to volume, issue, page, etc. so we cannot provide that.
We have now changed the column names to reflect the COLDP standard better.
suffix
will be mapped to col:page
from 2022-11-01 on wards
For journal
, we have no alternative right now.
WoRMS is a taxonomic database, not full blown references database :)
Export of 2022-11-01 Imported 2022-11-10.
Classification in the imported data of 2022-11-01 for WoRMS Brachyura:
Classification at marinespecies.org:
I am inviting @bart-v, @gdower & @mdoering to look on the problem, what and where should be fixed:
Seems, the problem is in COLDP and CLB: zoological ranks "section" & "subsection" are incorrectly placed in the classification (botanical style is implemented, section is inside genus)
@gdower: WoRMS Brachyura removed from the pipeline in March 2023 because it was finally totally breaking the CI pipeline. (i.e. import failed)
@bart-v, CoL is unable to process WoRMS Brachyura since November 2022. Something wrong with CoLDP export for this checklist (my guess is explained above). Is there a chance to find what is wrong and do fix in June's export for Annual Checklist 2023?
For the example https://www.marinespecies.org/aphia.php?p=taxdetails&id=240916 and as you explain: section & subsection are correctly placed between i.e. infraorder and superfamily in Zoology See https://en.wikipedia.org/wiki/Taxonomic_rank
WoRMS is exporting the full classification as seen on the URL above via the parentID
field in file Taxon.txt
So I don't think there is an issue on the WoRMS side
The broken import since March looks like a backend bug, Im looking into this
That issue was fixed many weeks ago and the dataset imports just fine - I did run an import just now.
The bad classification for Aethridae Dana, 1851 still persists in the latest version:
kingdom: Animalia >phylum: Arthropoda >subphylum: Crustacea >class: Malacostraca >subclass: Eumalacostraca >order: Decapoda >suborder: Pleocyemata >superfamily: Cryptochiroidea >family: Cryptochiridae >genus: Lithoscaptus >section: Eubrachyura de Saint Laurent, 1980 >subsection: Heterotremata Guinot, 1977 >superfamily: Aethroidea Dana, 1851 >family: Aethridae Dana, 1851
The verbatim data for the family uses a mix of parentID and flat classification. The parentID links to the superfamily, which then links to the subsection with a parentID to the section Eubrachyura which contains a bad parentID urn:lsid:marinespecies.org:taxname:106673
which does not exist! Because of that the flat classification is used and the flat section in ColDP is explicitly meant to be the botanical rank of a section. Thats why we get the troubles.
Solutions:
OK, The parentID will be added in the next export 2023-06-01
Thanks Bart. Looking at the issues there are 9 invalid parentID issues that might be good to fix: https://www.checklistbank.org/dataset/1108/verbatim?issue=parent%20id%20invalid
There are also other invalid, i.e. non existing ids that should be fixed to avoid bad data, but they probably do not have that much of an impact as the one above:
@yroskov we should make sure in the future that we never have invalid id issues in sources. That is asking for trouble.
These 9 parentID are fixed by adding urn:lsid:marinespecies.org:taxname:106673
We will soon replace Brachyura with DacaNet. Once done we'll have a more in-depth look at this.
@yroskov please verify at least the other 8 broken parentID records to see if they do not introduce any fatal problems for COL.
@yroskov please verify at least the other 8 broken parentID records to see if they do not introduce any fatal problems for COL.
@gdower, could you please pick this up (if it has a sense now because Brachyura will be replaced with DacaNet)?
ver 2023-06-01
[ ] Imported: 9593 spp
[ ] Metadata: no Creator, no Editor, etc. Citation: World List of marine Brachyura (ver. (06/2023)). (2023).. https://www.marinespecies.org/
[x] Classification: OK (problem fixed)
[ ] Sector: broken = SECTOR REMAINS BROKEN 2023-06-01 due to problem in CLB
Step 1. Two sectors deleted in Assembly
Step 2. Replace button does not work (why?)
Step 3. Delete subtree does not work also
==============
ISSUES assessed 2023-06-02
TASKS
Resolved 2023-06-02:
@bart-v, looking through Issues report in the checklistbank... (https://www.checklistbank.org/dataset/1108/issues)
Seems, year in the authorstrings is incorrectly spelled:
Tanzanonautes Feldmann, O'Connor, Stevens, Gottfried, Roberts, Ngasala, Rasmusson & Kapilima, 21007 = 2007 Tanzanonautes tuerkayi Feldmann, O'Connor, Stevens, Gottfried, Roberts, Ngasala, Rasmusson & Kapilima, 21007 = 2007 Clampethildella spinosa Beschin, Busulini & Tessier, 20212
Some strange cases:
I just checked the very first record of the invalid rank order, "family" Epialtidae. Its parent is also a family called the same. It seems the first, link record is an unaccepted subfamily Pliosomatinae in worms, but is exported wrongly as a family?
We export urn:lsid:marinespecies.org:taxname:439053 as subfamily Names.txt line 11262
But we indeed replace the entry with it's accepted name/taxon (of another rank in this case) when it has accepted children. This is the same problem as mentioned before in other issues. If COL cannot handle unaccepted parents this is what happens now and then...
I think we should just ignore these cases for now, as it's rather minimal.
The subfamily name indeed is there correctly, but the corresponding taxon urn:lsid:marinespecies.org:taxname:439053 does not use it, but instead has col:nameID=urn:lsid:marinespecies.org:taxname:196143 which is the family.
It causes a bad classification in COL:
There are 1799 bare names in Brachyura, i.e. name records that have no taxon or synonym record pointing to them. Would there be any reason to have these or are they likely all names with similar problems?
Examples of missing synonyms/taxa:
unaccepted genus Acanthus:
It seems there is a systematic problem with unaccepted names.
These are unaccepted taxa without children or with unaccepted children only. I don't see why this would be an issue.
You don't want them in the Taxon file maybe?
Should they not be synonyms? You list an accepted name for all of them in WoRMS at least, so I would expect them to show as synonyms in ColDP/CLB.
The original subfamily issue seems to be sth else though. Any idea how the wrong name id got into the export?
Yes, they should. For some reason we have limited synonyms to ranks equal or below species. If we also list higher ranks, will this fix the synonym issue?
That seems likely. It will at least remove most of the bare names I've listed above, although 601 of them were species - maybe these are all "chained" synonyms that have another synonym as their accepted name?
What is still puzzling me is how the wrong family nameID ended up in the subfamily taxon.
Good
Both will be available in the next export 2023-07-01
For the family issue: it's always the same problem: COL cannot deal with unaccepted parents...
As you know WoRMS has no Taxon vs. Name concept. Everything is a name. In Taxon.txt we list all accepted names and assign the NameID the ID of the accepted name. Which is itself for accepted names. <= fine
But, for the cases where an unaccepted taxon/name contains accepted children, we do a trick:
So I propose, we keep the NameID and TaxonID the same in all cases. OK? This may cause some side effects in i.e. Synonyms, but we can deal with this later
That sounds right, yes. Just keep the nameID the same as the Taxon.ID or Synonym.ID. And making unaccepted names which contain accepted children provisionally accepted is also the best option. Unaccepted names without accepted children should become synonyms with nameID = Synonym.ID
OK done, will be available in next export 2023-07-01
Dear @bart-v, would it be possible (as an exception) to do "manual" export of Brachyura and send it to @gdower? We are completing Annual Checklist 2023 on this week. It would be nice to have updated Brachyura in it.
Thank you! We are proceeding with update.
WoRMS Brachyura, id 1108 on prod https://data.catalogueoflife.org/catalogue/3/dataset/1108
@gdower, could you pls help me to understand why infraorder Brachyura and sections are missing WoRMS Brachyura export? Is it export problem or because of interpretation by CoL+ software?
Sector established as suborder Pleocyemata (old Brachyura superfamilies from Not Assigned infraoder deleted):