Open aoern opened 9 months ago
There are 3 places to fix this.
1) we have control over the source archives and can declare the code in the coldp default.yaml file
.
2) If an entire source dataset follows a single code we should declare that in the dataset settings and it would be applied to all its name upon the next import - which we should then trigger.
3) Otherwise we can also set the code in every sector setting - which allows to apply a code also to mixed sources such as ITIS.
It's not nice to learn that some GSDs have problems with Codes again. A year ago (or so) all GSDs in CoL have been associated with Codes. We'll have a look on the problem again.
Well, neither @gdower nor I have a control over Tortricid.net and WCVP-Fabaceae. These GSDs were converted and imported in CLB by other people. (@dhobern, could you please apply the ICZN to Tortricid.net during re-import?)
Whereas, Gymnodinium & Microsporidia have been moved in CLB from AC19. We'll see with Geoff, what we can do here.
like I said above there are various ways to fix this, you don't need to change the archive files if you don't have access
WCVP-Fabaceae already has "botanical' Code in the settings.
I do re-sync now (2023-11-13)
the dataset setting is for imports, not syncs...
But the Fabaceae dataset does have the code imported already: https://www.checklistbank.org/dataset/2304/names?NOM_CODE=botanical
@gdower, @mdoering, could you please describe here what exactly you have done today with Gymnodinium, Microsporidia and WCVP-Fabaceae, Tortricid.net? I.e. we need to document (1) which fix of 3 was applied, (2) was re-import completed, (3) was re-sync completed.
GSD Name | Where the Code set up | re-imported | re-synced |
---|---|---|---|
WCVP-Fabaceae | No actions: the Code was already defined in ver. 2023v.4 / 2023-08-02 | no | re-synced 2023-11-13 |
Gymnodinium | set up for the sector | no | re-synced 2023-11-13 |
Microsporidia | set up for the sector | no | re-synced 2023-11-13 |
Tortricid.net | set up for GSD | re-imported 2023-11-13 | re-synced 2023-11-13 |
@aoern, you gave 4 GSDs as an example. Do you have a full list of GSDs where Code is not defined?
I only added code=zoological to the Tortricid.net dataset options and reimported it
@yroskov and I added code to the sectors for Gymnodinium and Microsporidia and resynced them.
Thanks - should this property move into (or be duplicated in) the main metadata page/document? Or maybe the Options tab should be merged into the Metadata tab but only visible to those editing the page or to administrators. It could then be made a mandatory field for completion.
That is how we started, but I feel it is cleaner to separate configurations/settings from metadata that informs you but is not used for interpretations.
Apart from the nomenclatural code you can also declare extinct or the default environment in the settings. And to be honest I would not mind to be even more generic in the future and allow defaults for any coldp or dwc term just as we do with default.yaml. Maybe splitting default values from other settings helps to visualise them on the main metadata page?
allow defaults for any coldp or dwc term
I am supporting this idea. But: please use controlled vocabularies, where interface user have an option to chose value from the list.
Does this report http://api.checklistbank.org/dataset/3/nameusage/search?nomCode=_NULL&facet=rank&facet=SECTOR_DATASET_KEY have an answer to the question, which GSDs in CoL still have no Code assignment?
How to assign Code to the nodes in the management classification (there are no sectors there)?
@yroskov, apart from the 4 GSDs mentioned earlier (Gymnodinium, Tortricid.net, CVP-Fabaceae, and Microsporidia) there are no other datasets with specific and infraspecific taxa without a defined code. However, there area still 56.000+ high rank taxa without a defined code in several datasets.
...there are still 56.000+ high rank taxa without a defined code in several datasets.
Thank you, @aoern! It looks like majority of them belong to the "management classification", i.e. they are above/outside sectors and GSDs.
@mdoering, do we have a mechanism to assign Code to the taxa outside GSD sectors? Not sure that "3 places to fix this" work for management classification.
Separate question, what to do with taxa of ranks which are not regulated by the Code (ICZN regulates names from species-group to family-group and not above)?
ICZN does regulate names above superfamily, just in fewer ways. So they still have to be properly published, but they are not subject to Priority, for example. Art. 1.2.2 notes the Articles that apply for names above the family-group ranks.
@yroskov, you write that "It looks like majority of them belong to the "management classification"". However, source 0 and IRMNG count only to 4.000+ cases and they are not included in these 56.000 cases.
source 0 and IRMNG count only to 4.000+ cases and they are not included in these 56.000 cases.
That is interesting. If these taxa are inside sectors, @mdoering, we need a list of parent GSDs where Code is not designated yet. There's not a quick way to get the info out of the API, as @gdower found.
Does this report http://api.checklistbank.org/dataset/3/nameusage/search?nomCode=_NULL&facet=rank&facet=SECTOR_DATASET_KEY have an answer to the question, which GSDs in CoL still have no Code assignment?
Yes, the facet tells you which source datasets (sectorDatasetKey) and ranks are involved. If you limit the search to zero you only get the facet to view: http://api.checklistbank.org/dataset/3/nameusage/search?nomCode=_NULL&facet=rank&facet=SECTOR_DATASET_KEY&limit=0
You can see that most come from PBDB. Then SF+.
Btw, you can also add the filter nomCode=_NULL
to the UI search URL and it uses it to filter even though the filter is not present in the forms yet (ping @thomasstjerne, see https://github.com/CatalogueOfLife/checklistbank/issues/1319).
How to assign Code to the nodes in the management classification (there are no sectors there)?
You can define the code in the name editor. There is no bulk tool yet, but I am sure we'll need it. Well, checking the taxon forms there does not seem to be a code field:
@thomasstjerne that would be need in the future. Also extinct and really all the other name & taxon fields too, at least in an advanced section: https://github.com/CatalogueOfLife/checklistbank/issues/1320
If anything needs adjusting in IRMNG, I am happy to assist. I think nomenclature Code is added via a script at the data export stage, it is not in my edit interface.
On Wed, 15 Nov 2023, 7:19 am Markus Döring, @.***> wrote:
Does this report http://api.checklistbank.org/dataset/3/nameusage/search?nomCode=_NULL&facet=rank&facet=SECTOR_DATASET_KEY have an answer to the question, which GSDs in CoL still have no Code assignment?
Yes, the facet tells you which source datasets (sectorDatasetKey) and ranks are involved. If you limit the search to zero you only get the facet to view: http://api.checklistbank.org/dataset/3/nameusage/search?nomCode=_NULL&facet=rank&facet=SECTOR_DATASET_KEY&limit=0
You can see that most come from PBDB https://www.checklistbank.org/dataset/268676/names?nomCode=_NULL. Then SF+ https://www.checklistbank.org/dataset/2073/names?nomCode=_NULL.
Btw, you can also add the filter nomCode=_NULL to the UI search URL and it uses it to filter even though the filter is not present in the forms yet (ping @thomasstjerne https://github.com/thomasstjerne, see CatalogueOfLife/checklistbank#1319 https://github.com/CatalogueOfLife/checklistbank/issues/1319).
How to assign Code to the nodes in the management classification (there are no sectors there)?
You can define the code in the name editor. There is no bulk tool yet, but I am sure we'll need it.
— Reply to this email directly, view it on GitHub https://github.com/CatalogueOfLife/data/issues/583#issuecomment-1811182573, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDXIXLH5XI6M2SNSQ7AZXDYEPG4NAVCNFSM6AAAAAA7HLISFCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRGE4DENJXGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>
More info about missing code GSDs: There are tens of GSDs that contain taxa without nomenclatural code:
Species Fungorum Plus 14210 taxa Systema Dipterorum 11456 FishBase 5758 WSC 4390 CilCat 1262 and many many more
Well, Species Fungorum Plus, Systema Dipterorum, FishBase, WSC, CilCat, all have defined Code of Nomenclature in the Options (and had it before their imports). It means, we are doing Sisyphean labor. The problem is not in data, but in the CLB code.
Ah, looking at Fishbase examples I see that these are all names not present directly as a record on it own in the source, but are names with origin=denormalised, i.e. they are found only in the flat, higher classification and need to be extracted. The code doing that probably does not add any default values
The code doing that probably does not add any default values
Related: I was testing the default values and for ACEF imports with extinct set true in the dataset setting, the importer doesn't set the extinct flag on denormalized records.
fixed in code now, but not will be deployed to prod only tomorrow: https://github.com/CatalogueOfLife/backend/commit/435d236048251c8c89a939a5cfb39146f003d22c
There are 101.000+ low taxon names (species or infraspecific) and 64.000+ high taxon names without a defined nomenclatural Code.
Examples of datasets that totally lack nomenclatural codes: Gymnodinium Tortricid.net WCVP-Fabaceae Microsporidia Strange enough, WCVP-Fabaceae however lists the codes of all subspecies taxa (1800+).
There are also several datasets that list the nomenclatural codes of low taxa but not of generic taxa and higher.
In most cases the correct code is easy to deduce, but because of missing codes it is impossible to design a general purpose script or program that needs to know the code, for example a species name syntax checker.