Open mdoering opened 3 years ago
Never seen such cases before. How it's possible in the Tree? It would be nice if clearinghouse supports ranks integrity in taxonomic hierarchy.
As I can see, this problem appears in source data for IRMING:
Yes, for imported data we do flag such problems. But for projects we currently do not run any validation as the data can be changed at any moment. We have planned to provide a manual "validate" method that flags all project record issues. In the pipeline and I guess quite important, so we should keep this on top of the todos.
Doing a manual query this is not a widespread problem at all. There are just 4 records in COL draft with a parent rank matching its child:
id | rank | scientific_name | parent_id | rank | scientific_name
--------------------------------------+------------+--------------------------------------------+--------------------------------------+------------+--------------------------------------
9bd61ef4-3c49-43e4-814d-8d2d7eacd9df | SUBPHYLUM | Dipilida | ef4f7c04-27f2-42f0-be27-140c53804e2f | SUBPHYLUM | Euglenoida
d90ed332-538e-464b-91f4-ec30c7061c09 | SUBPHYLUM | Entosphona | ef4f7c04-27f2-42f0-be27-140c53804e2f | SUBPHYLUM | Euglenoida
f8c38485-02d9-41e2-9d28-08d8b57ae9a1 | SUBSPECIES | Physematium scopulinum subsp. scopulinum | d81db1f7-d7f2-4702-9fcc-3f94844e2248 | SUBSPECIES | Physematium scopulinum appalachianum
dbf3c78d-a28c-48ae-b9b6-aad0e5cd02fc | SUBSPECIES | Physematium scopulinum subsp. laurentianum | d81db1f7-d7f2-4702-9fcc-3f94844e2248 | SUBSPECIES | Physematium scopulinum appalachianum
Subspecies case in World Ferns: checked against master file. There is a mistake in the master file with Taxon code: Physematium scopulinum ssp. appalachianum should have a code SS. Reported to the author: It would be nice to correct in next exports.
S | | Physematium scopulinum ssp. appalachianum (T.M.C.Taylor) Li Bing Zhang, N.T.Lu & X.F.Gao SS | | Physematium scopulinum Trevis. ssp. scopulinum Trevis. SS | | Physematium scopulinum Trevis. ssp. laurentianum (Windham) Li Bing Zhang, N.T.Lu & X.F.Gao
Trying to fix in the clearinghouse: 1) cleaned up previous complex decision (change rank)
Well, it looks now like that:
FIXED in the master file. Results look now like that (2021-01-07):
I'll not touch cases of Dipilida & Entosphona in IRMNG because don't know what cause a problem.
If I block these names via Editorial Decision, they, probably, will be blocked in all next IRMNG updates.
Mybe change their rank through a decision?
I changed rank for subspecies via complex decision and get mess.
@mdoering, can we include a check for this in the importing process so that this does not happen again in the future?
We expect a new version of IRMNG in March. @yroskov will inform Tony Rees about this issue so they can try to solve this prior to the next version.
I have sent email to Tony Rees:
Dear Tony, Just in case, if you are not aware of the bug appeared in IRMNG dataset uploaded in the Clearinghouse. Somehow, subphylum Euglenoida has two children of the same rank: subphyla Dipilida and Entosphona. (https://github.com/CatalogueOfLife/data/issues/222) Could you please try to fix this in next March export? Yours, Yuri
We flag an issue now for PARENT_SPECIES_MISSING
which is an infraspecific name which does not have a species as a parent.
The flagging of CLASSIFICATION_RANK_ORDER_INVALID
still needs to be implemented.
Just also added it to the importer
From: Tony Rees tonyrees49@gmail.com Sent: Thursday, January 14, 2021 12:56 To: World Register of Marine Species (WoRMS) info@marinespecies.org Cc: Roskov, Yury yroskov@illinois.edu Subject: Small bug in IRMNG DwCA export file generation
Dear Bart et al., Yuri Roskov of CoL has pointed out that 2 names held in IRMNG at the rank of infraphylum are being exported as rank=subphylum when the DwCA export file is generated, which is causing an inconsistency in CoL (child having the same rank as its next level parent), which should be fixed if possible. The 2 names he has discovered are in Euglenozoa, but it turns out there are 2 more in dinoflagellates; here is the full list (rank = infraphylum): Search for '' returned 4 matching records. Click on one of the taxon names listed below to check the details. [new search] [direct link] [download results] • Apicomplexa • Dinozoa • Dipilida • Entosphona All of these are presented erroneously as subphylum (next available rank up) in the March 2020 DwCA export file (presumably also in the previous one; the names were added to IRMNG in May 2018), which is the cause of the downstream problem for users. Can you maybe look into this in advance of the next export file generation, which I have in mind for a couple of months' time? (March 2021)... Thanks - Tony
... meanwhile we should create a decision to change the rank.
Such "complex decision" creates mess. I checked this with Physematium scopulinum (WFerns) and with other taxa previously.
what does "mess" mean exactly? If something is wrong it needs to be fixed. It is supposed to work to modify the rank of a source taxon. If it doesn't we have a bug to work on!
In the case/s from IRMNG, the native rank in the master system is "infraphylum" which was being changed to "subphylum" in the DwCA export, in the belief that "infraphylum" was not an allowed term - however Markus says it is acceptable so in the next export from IRMNG (expected March 2021), the offending names will be exported at their original rank and the problem should disappear - for the following IRMNG names (were showing as subphylum in error): Apicomplexa Dinozoa Dipilida Entosphona
As per email correspondence, Tony/Markus/Bart (VLIZ)/Yuri, January 2021.
Meanwhile if someone wants to fix these up in advance by changing the rank to the correct one, that is fine as well, Regards - Tony
@yroskov is anything stopping you from changing the rank for the few taxa right now?
I am against unnecessary changes via clearinghouse. Corrections (if any) will appear with a new version of IRMNG via update (March?).
But isn't this a rather serious error to be fixed rather sooner than later? Or is it just a matter of weeks we are talking about?
Unfortunately, I do not know how clearinghouse software will respond with "complex decision" on top of Dipilida & Entosphona, when they will be delivered as infraphyla with a new update. Too many unexpected bugs or broken sectors. Better to live with minor glitch in empty branches.
As a user this is a serious issue. It surely needs addressed before the annual release one way or another.
If a new IRMNG becomes available in the Clearinghouse in March, the bug will be resolved in the annual checklist.
COL apparently contains taxa with have a parent with the same rank. This is wrong and should be avoided in all cases (unless it is UNRANKED or a similar unordered rank).
Sadly we havent exposed validation of a project yet, which would have exposed these problems. Here is one example under Euglenoida: