Open yroskov opened 2 years ago
For attention of @mdoering:
[ ] the problem with a placement of hybrid genera in the classification:
many (if not all) genera with hybrid symbol are placed in the root of the tree, outside any families. They have no children species:
there is a bunch for genera with hybrid symbol placed in artificial genus "xx" in Orchidaceae family. These genera have no children species:
species with hybrid symbol have no children infraspecific taxa; all infraspecific taxa which are children of hybrid species are placed outside their parents (see genus Dactylorhiza):
https://data.catalogueoflife.org/dataset/2232/classification?taxonKey=55153-wcs
(Unfortunately, we cannot go ahead with WCVP until this problem will be fixed).
ASSEMBLY
2021-12-22 Sector Acorales - Acoraceae established for the purpose of accessibility of Workbench tool (available inside the Project only).
[ ] Field Nomenclatural Status is empty in this WCVP view.
[ ] How to block Synonyms, Illegitimate & Invalid names without parent accepted name? See, https://github.com/CatalogueOfLife/testing/issues/175#issuecomment-966607516
Interpretation of WCVP statuses into CoL statuses.
Total: 1,048,575 names in my Excel spreadsheet (seems, incomplete list imported in the Excel due to its limits)
Accepted (308,084) = accepted Synonym (634,237; 901 of them have no parent accepted name) = synonym (except those 901 names = bare names) Misapplied (947; all with parent accepted name) = synonyms Othographic (1,574; 3 of them have no parent accepted name) = synonym. Not clear, what to do with those 3 accepted "orthograpic" names: Cassine congonha A.St.-Hil.; Aspidosperma clerceanum Iljin & Krasch.; Croton benzoe L.
Unplaced (47,165; 47,139 of them have no parent accepted name) = (!) all bare names Illegitimate (32,277; 43 of them have no parent accepted name) = synonyms (except those 43 names = (!) bare names) Invalid (22,792; 69 of them have no parent accepted name) = synonyms (except those 69 names = (!) bare names) Local Biotype (100; all with parent accepted name) = synonyms Artificial Hybrid (1,395; 318 of them have rank "genus", 1,072 "species", 2 "variety", 3 blank)
The coldp generator code for WCVP currently maps the following, keeping all others as they are:
status:Unplaced
-> bare name
for all casesIllegitimate
, Invalid
or Orthographic
will become the nomenclatural status value and the taxonomic status is set to synonym
Because this mapping is done in the coldp generator the original values do not show up in the verbatim data in CLB, this is a problem for searching on them.
Maybe it is best to keep the data as verbatim as it was and do some of the mapping generically inside the importer? Might not be simple in case of nomenclatural statuses disguised as taxonomic ones.
WCVP of 2021-02-21, updated version, imported on PROD 2022-02-14.
[x] Imported: 351,046 spp (= in 2021-12-03) families ? (no data) genera 16,363 (16,473 in 2021-12-03) subsp 22,299 (= in 2021-12-03) var 20,166 (= in 2021-12-03) f 476 (= in 2021-12-03)
[ ] Metadata: see above - contact Rafael for metadata in CoL.
[ ] Classification: top rank - families; no orders. Still there are many genera in the root outside families.
In the source file: -- | -- | Family | Genus | my comment |
---|---|---|---|---|
Genus | Artificial Hybrid | Solanaceae | Eriochroma | No species in the source |
Genus | Artificial Hybrid | Solanaceae | Iozelia | No species in the source |
Genus | Artificial Hybrid | Orchidaceae | Aberconwayara | No species in the source |
Genus | Artificial Hybrid | Orchidaceae | Acampodorum | No species in the source |
Genus | Artificial Hybrid | Orchidaceae | Acampostylis | No species in the source |
Genus | Artificial Hybrid | Orchidaceae | Acapetalum | No species in the source |
Genus | Artificial Hybrid | Orchidaceae | Acemannia | No species in the source |
Genus | Artificial Hybrid | Orchidaceae | Aceraherminium | No species in the source |
Local Biotype (100; all with parent accepted name) = synonyms Artificial Hybrid (1,395; 318 of them have rank "genus", 1,072 "species", 2 "variety", 3 blank)
I don't know what those 2 status values exactly mean. Maybe we can remove all of these records? Would be simple to do in the ColDP generator, or at least only create bare names for them. To do with editorial decisions we would first need to have the verbatim status search feature implemented in both backend and frontend.
Don't worry. Names from the root are not going to the CoL, because they are out of sectors (families in the case of WCVP). Artificial hybrids have no sense for the CoL.
Names with "Local Biotype" status, if they are synonyms, are fine for the CoL. However, if they will be not present in CLB, it's also fine for me.
Sectors re-assembled, 2022-03-25:
Family | Previous GSD |
---|---|
Order Acorales | WCSP |
Family Acoraceae | WCSP |
Order Alismatales | WCSP |
Family Alismataceae | WCSP |
Family Aponogetonaceae | WCSP |
Family Araceae | WCSP |
Family Butomaceae | WCSP |
Family Cymodoceaceae | WCSP |
Family Hydrocharitaceae | WCSP |
Family Juncaginaceae | WCSP |
Family Maundiaceae | WCSP |
Family Posidoniaceae | WCSP |
Family Potamogetonaceae | WCSP |
Family Ruppiaceae | WCSP |
Family Scheuchzeriaceae | WCSP |
Family Tofieldiaceae | WCSP |
Family Zosteraceae | WCSP |
Order Arecales | WCSP |
Family Arecaceae | WCSP |
Family Dasypogonaceae | WCSP |
Order Asparagales | WCSP |
Family Amaryllidaceae | WCSP |
Family Asparagaceae | WCSP |
Family Asphodelaceae | WCSP |
Family Asteliaceae | WCSP |
Family Blandfordiaceae | WCSP |
Family Boryaceae | WCSP |
Family Doryanthaceae | WCSP |
Family Hypoxidaceae | WCSP |
Family Iridaceae | WCSP |
Family Ixioliriaceae | WCSP |
Family Lanariaceae | WCSP |
Family Orchidaceae | WCSP |
Family Tecophilaeaceae | WCSP |
Family Xeronemataceae | WCSP |
Order Commelinales | WCSP |
Family Commelinaceae | WCSP |
Family Haemodoraceae | WCSP |
Family Hanguanaceae | WCSP |
Family Philydraceae | WCSP |
Family Pontederiaceae | WCSP |
Sectors re-assembled, 2022-03-29:
Family | Previous GSD |
---|---|
Order Dioscoreales | WCSP |
Family Burmanniaceae | WCSP |
Family Dioscoreaceae | WCSP |
Family Nartheciaceae | WCSP |
Order Liliales | WCSP |
Family Alstroemeriaceae | WCSP |
Family Campynemataceae | WCSP |
Family Colchicaceae | WCSP |
Family Corsiaceae | WCSP |
Family Liliaceae | WCSP |
Family Melanthiaceae | WCSP |
Family Petermanniaceae | WCSP |
Family Philesiaceae | WCSP |
Family Ripogonaceae | WCSP |
Family Smilacaceae | WCSP |
Order Pandanales | WCSP |
Family Cyclanthaceae | WCSP |
Family Pandanaceae | WCSP |
Family Stemonaceae | WCSP |
Family Triuridaceae | WCSP |
Family Velloziaceae | WCSP |
Order Petrosaviales | WCSP |
Family Petrosaviaceae | WCSP |
Order Poales | WCSP |
Family Bromeliaceae | WCSP |
Family Cyperaceae | WCSP |
Family Ecdeiocoleaceae | WCSP |
Family Eriocaulaceae | WCSP |
Family Flagellariaceae | WCSP |
Family Joinvilleaceae | WCSP |
Family Mayacaceae | WCSP |
Family Poaceae | WCSP |
Family Rapateaceae | WCSP |
Family Restionaceae | WCSP |
Family Thurniaceae | WCSP |
Family Typhaceae | WCSP |
Family Xyridaceae | WCSP |
Order Zingiberales | WCSP |
Family Cannaceae | WCSP |
Family Costaceae | WCSP |
Family Heliconiaceae | WCSP |
Family Lowiaceae | WCSP |
Family Marantaceae | WCSP |
Family Musaceae | WCSP |
Family Strelitziaceae | WCSP |
Family Zingiberaceae | WCSP |
Sectors of 2022-03-25&29 synced
Sectors re-assembled, 2022-03-30:
Family | Previous GSD |
---|---|
Order Apiales | |
Family Araliaceae | WCSP |
Family Myodocarpaceae | WCSP |
Order Cardiopteridales, newly established | |
Family Cardiopteridaceae | WCSP |
Family Stemonuraceae | World Plants |
Order Asterales | |
Family Campanulaceae | WCSP |
Order Cornales | |
Family Cornaceae | WCSP |
Family Nyssaceae | WCSP |
Order Ericales | |
Family Clethraceae | WCSP |
Family Ebenaceae | WCSP |
Family Lecythidaceae | WCSP |
Family Sapotaceae | WCSP |
Order Fagales | |
Family Betulaceae | WCSP |
Family Fagaceae | WCSP |
Family Nothofagaceae | WCSP |
Family Ticodendraceae | WCSP |
Order Garryales | WCSP |
Family Eucommiaceae | WCSP |
Family Garryaceae | WCSP |
Order Gentianales | |
Family Rubiaceae | WCSP |
Order Huerteales | |
Family Petenaeaceae | WCSP |
Order Lamiales | |
Family Lamiaceae | WCSP |
Family Oleaceae | WCSP |
Family Schlegeliaceae | WCSP |
Family Verbenaceae | WCSP |
Order Magnoliales | |
Family Magnoliaceae | WCSP |
Order Malpighiales | |
Family Caryocaraceae | WCSP |
Family Centroplacaceae | WCSP |
Family Euphorbiaceae | WCSP |
Family Pandaceae | WCSP |
Family Peraceae | WCSP |
Family Phyllanthaceae | WCSP |
Family Picrodendraceae | WCSP |
Family Putranjivaceae | WCSP |
Order Malvales | |
Family Sarcolaenaceae | WCSP |
Family Sphaerosepalaceae | WCSP |
Order Santalales | |
Family Opiliaceae | WCSP |
Sectors of 2022-03-30 synced
Checks in CoL-Preview 2022-04-01:
FIXED for all cases.
(Choice of first or second item resulted with Error code 500, message "There was an error processing your request. It has been logged (ID 1401e196cbc64eb9).")
Strange appearance of children subspecies in the source data:
https://www.checklistbank.org/catalogue/3/assembly?datasetKey=2232&sourceTaxonKey=2295-wcs
https://github.com/CatalogueOfLife/testing/issues/189#issuecomment-1089344847
Sync all WCVP sectors 2022-04-05
Tests of 2022-04-07:
ISSUES assessed 2022-04-07
Partially Parsable Name, 62. = few blocked
Examples:
Eriochroma × J.M.H.Shaw
Rhaphiobotrya × Coombes
Aesculus sp. &dallimorei Sealy
Cytisus sp. &sordidus K.Koch
Hieracium subsp. subsp.allochroum Norrl.
Gouldia var. var.typica Fosberg
Solanum phureja var. janck 'o-phureja Ochoa
× Dactylodenia nothosubsp.sourekii (F.Proch.) Eccarius
Doubtful Name, 65. = no action
Examples:
× Gymnorchis Osva?.
Alpinia vietnamica H.Ð.Tr?n, Luu & Skornick.
Cephalaria tuteliana Ku? & Göktürk
Ambrosia artemisiifolia f. gracilissima D.Cîr?u & M.Cîr?u
Authorship Contains Nomenclatural Note, 255. Case of "ined." = 248 accepted FIXED as Prov. Acc. Example: Anthoxanthum novae-zelandiae (Gand.) ined.
Indetermined, 259. Names with "sp." blocked. Examples: Cibotium sp. Krajina Arabis divaricarpa var. B.Boivin (few names blocked) Names with "convar." are OK: Brassica oleracea convar. caulorapa (DC.) Alef.
Taxonomic Status Doubtful, 1074. Names with GSD status "Misapplied". = OK, no action. Example: misapplied | Acer barbatum Hook.
Unparsable Authorship, 1847 = encoding & parsing problems, no action Examples: Protoedraianthus (Laku ic) Laku ic Acinos hungaricus (Simonk.) ilic Acrostichum glutinosum Spruce ap.Christ Alyssum gallaecicum (S.Ortiz) paniel, Marhold & Lihová Andryala oestivalis not_stated Astracantha alexandri (irj.) Podlech Holubiella lunarioides (Michx.) koda
Subspecies Assigned, 2623. Trinomials (syn) without marker. If rank "subspecies" assigned by CLB, it's wrong; should be "var." No action. Example: Agave horrida macrodontha Van Geert
Accepted Id Invalid, 4847 & Accepted Name Missing, 6177. All are "bare names" in CLB. OK, no action.
Parent Species Missing, 42961. Accepted trinomials. No action, let see what will be happened in final product.
TASKS 2022-04-07 NB! It's whole dataset.
2022-04-07 ACC-ACC diff. auth. Only 3 names from WCVP sectors resolved.
2022-04-08, not completed yet:
2022-04-20 resolved as:
Remains unresolved SYN-SYN species & infra (same acc, same auth), because comment portions (nom. nud., nom. illegit., etc) removed from the authorstrings.
Synced 2022-04-20
Tests in CoL of 2022-04-26 on PREVIEW
Some names from those 901 (from WCVP families taken in CoL) appear in production. They present in search box, but gives error when chosen.
Example: Pristidia divaricata (Rubiaceae)
Search:
Result:
Aralia lyallii var. robusta (Araliaceae)
Search:
Result:
Aralia polaris (Araliaceae)
Search:
Result:
WCVP of 2022-10-27
Family | Sector status 2022-11-21 | 2022-11-28 (after Markus' re-match) |
---|---|---|
Order Acorales | ok | |
Family Acoraceae | + | |
Order Alismatales | ok | |
Family Alismataceae | + | |
Family Aponogetonaceae | + | |
Family Araceae | + | |
Family Butomaceae | + | |
Family Cymodoceaceae | + | |
Family Hydrocharitaceae | + | |
Family Juncaginaceae | + | |
Family Maundiaceae | + | |
Family Posidoniaceae | + | |
Family Potamogetonaceae | + | |
Family Ruppiaceae | + | |
Family Scheuchzeriaceae | + | |
Family Tofieldiaceae | + | |
Family Zosteraceae | + | |
Order Arecales | ok | |
Family Arecaceae | + | |
Family Dasypogonaceae | + | |
Order Asparagales | fixed | |
Family Amaryllidaceae | + | |
Family Asparagaceae | + | |
Family Asphodelaceae | + | |
Family Asteliaceae | + | |
Family Blandfordiaceae | + | |
Family Boryaceae | + | |
Family Doryanthaceae | + | |
Family Hypoxidaceae | + | |
Family Iridaceae | + | |
Family Ixioliriaceae | + | |
Family Lanariaceae | + | |
Family Orchidaceae | + | |
Family Tecophilaeaceae | + | |
Family Xeronemataceae | ![]() |
|
Order Commelinales | ok | |
Family Commelinaceae | + | |
Family Haemodoraceae | + | |
Family Hanguanaceae | + | |
Family Philydraceae | + | |
Family Pontederiaceae | + | |
Order Dioscoreales | ok | |
Family Burmanniaceae | + | |
Family Dioscoreaceae | + | |
Family Nartheciaceae | + | |
Order Liliales | ok | |
Family Alstroemeriaceae | + | |
Family Campynemataceae | + | |
Family Colchicaceae | + | |
Family Corsiaceae | + | |
Family Liliaceae | + | |
Family Melanthiaceae | + | |
Family Petermanniaceae | + | |
Family Philesiaceae | + | |
Family Ripogonaceae | + | |
Family Smilacaceae | + | |
Order Pandanales | ok | |
Family Cyclanthaceae | + | |
Family Pandanaceae | + | |
Family Stemonaceae | + | |
Family Triuridaceae | + | |
Family Velloziaceae | + | |
Order Petrosaviales | ok | |
Family Petrosaviaceae | + | |
Order Poales | ok | |
Family Bromeliaceae | + | |
Family Cyperaceae | + | |
Family Ecdeiocoleaceae | + | |
Family Eriocaulaceae | + | |
Family Flagellariaceae | + | |
Family Joinvilleaceae | + | |
Family Mayacaceae | + | |
Family Poaceae | + | |
Family Rapateaceae | + | |
Family Restionaceae | + | |
Family Thurniaceae | + | |
Family Typhaceae | + | |
Family Xyridaceae | + | |
Order Zingiberales | ok | |
Family Cannaceae | + | |
Family Costaceae | + | |
Family Heliconiaceae | + | |
Family Lowiaceae | + | |
Family Marantaceae | + | |
Family Musaceae | + | |
Family Strelitziaceae | + | |
Family Zingiberaceae | + | |
Order Apiales | ok | |
Family Araliaceae | + | |
Family Myodocarpaceae | + | |
Order Cardiopteridales | ok | |
Family Cardiopteridaceae | + | |
Family Stemonuraceae | + | |
Order Asterales | ok | |
Family Campanulaceae | + | |
Order Cornales | fixed | |
Family Cornaceae | + | |
Family Nyssaceae | missing | |
Order Ericales | ok | |
Family Clethraceae | + | |
Family Ebenaceae | + | |
Family Lecythidaceae | + | |
Family Sapotaceae | + | |
Order Fagales | ok | |
Family Betulaceae | + | |
Family Fagaceae | + | |
Family Nothofagaceae | + | |
Family Ticodendraceae | + | |
Order Garryales | ok | |
Family Eucommiaceae | + | |
Family Garryaceae | + | |
Order Gentianales | ok | |
Family Rubiaceae | + | |
Order Huerteales | ok | |
Family Petenaeaceae | + | |
Order Lamiales | ok | |
Family Lamiaceae | + | |
Family Oleaceae | + | |
Family Schlegeliaceae | + | |
Family Verbenaceae | + | |
Order Magnoliales | ok | |
Family Magnoliaceae | + | |
Order Malpighiales | ok | |
Family Caryocaraceae | + | |
Family Centroplacaceae | + | |
Family Euphorbiaceae | + | |
Family Pandaceae | + | |
Family Peraceae | + | |
Family Phyllanthaceae | + | |
Family Picrodendraceae | + | |
Family Putranjivaceae | + | |
Order Malvales | ok | |
Family Sarcolaenaceae | + | |
Family Sphaerosepalaceae | + | |
Order Santalales | ok | |
Family Opiliaceae | + |
Synced with resolved sectors and un-resolved Issues & Tasks 2022-11-28
WCVP of 2022-10-27 (continue)
TASKS
I assume they have entirely different ids? Looking at the first WCVP decision about Pyrus communis f. briggsii:
"subject": {
"id": "1010083-az",
"name": "Pyrus communis f. briggsii",
"authorship": "Syme",
"rank": "form",
"status": "synonym",
"parent": "Pyrus",
"broken": true,
"label": "Pyrus communis f. briggsii Syme",
"labelHtml": "<i>Pyrus communis</i> f. <i>briggsii</i> Syme"
},
The parent=Pyrus
property seems wrong. For this synonym I suppose this should have been one of the accepted names. @thomasstjerne know how parent in a decision gets populated?
The ids have definitely changed. The above is either 2963149 or 3010083 now: https://www.checklistbank.org/catalogue/3/decision?limit=100&offset=0&subjectDatasetKey=2232
Rematching fails. I cant say for 100%, but I suspect because the parent is set wrongly to the genus.
WCVP of 2022-10-27 (continue)
ISSUES
TASKS
[x] 60,347 broken decisions. = DELETED all 2022-12-14
ACC-ACC sp DiffAuth: 2022-11-29 resolved spp from CoL families; families Euphorbiaceae, Rubiaceae, Sapotaceae, Lamiaceae re-synced 2022-11-29
ACC-ACC species (same authors) & ACC-ACC infraspecies and infraspecies marker (different authors) = no WCVP families in CoL
SYN-SYN species (different accepted, different authors), 22926 duplicates. 12m30s for a page (setting 990 duplicates/page).
Identical genus
Genus | Author | Family | Decision |
---|---|---|---|
Bessera | Amaryllidaceae | prov acc | |
Bessera | Schult.f. | Asparagaceae | |
Colobogynium | Costaceae | prov acc | |
Colobogynium | Schott | Araceae | |
Disporum | Asparagaceae | prov acc | |
Disporum | Salisb. | Colchicaceae | |
Ipomoea | Poaceae | prov acc | |
Ipomoea | L. | Convolvulaceae | |
Tricyrtis | Asparagaceae | prov acc | |
Tricyrtis | Wall. | Liliaceae | |
x Hueylihara | Anon. | Orchidaceae | |
x Hueylihara | Garay | Orchidaceae | prov acc |
x Jesupara | Glic. | Orchidaceae | prov acc: Orchid Rev. 126(1322, Suppl.): 43 (2018) |
x Jesupara | Glic. | Orchidaceae | |
x Nakamotoara | Anon. | Orchidaceae | |
x Nakamotoara | Garay | Orchidaceae | prov acc |
x Rumrillara | Garay | Orchidaceae | prov acc |
x Rumrillara | Anon. | Orchidaceae | |
x Scullyara | Anon. | Orchidaceae | |
x Scullyara | Garay & H.R.Sweet | Orchidaceae | prov acc |
x Smithara | Garay | Orchidaceae | prov acc |
x Smithara | Garay & H.R.Sweet | Orchidaceae | |
x Trichopsis | Anon. | Orchidaceae | |
x Trichopsis | Y.Itô | Cactaceae | prov acc |
Synced 2022-12-15
WCVP 10.0 / 2022-10-27 (the same as above)
TASKS resolved 2024-06-07:
WCVP ver. 13.0 / 2024-05-16; imported 2024-06-08
Metrics
TASKS
Resolved 2024-06-11:
Synced 2024-06-11
The EML metadata file has wrongly encoded characters. Nothing I can do, I informed Kew about it an cc#ed you.
I have resolved Tasks. @mdoering, May I re-sync all WCVP sectors again now?
sure!
WCVP, The World Checklist of Vascular Plants of 2021-02-21 was imported on PROD 2021-12-03 by @mdoering.
(see also report on DEV data: https://github.com/CatalogueOfLife/testing/issues/175)
[x] Imported: 351,046 spp families ? (no data) genera 16,473 subsp 22,299 var 20,166 f 476
[ ] Metadata: no version; alias included in full name; no editor, creator "The Royal Botanic Gardens, KewLondon, GB" - contact Rafael for metadata in CoL.
[ ] Classification: top rank - families; no orders. Many genera are in the root (outside families). They are mainly genera with hybrid symbol - all hybrid genera outside families?. Actually, the same problem is with hybrid taxa of all ranks - they are not placed in their parent taxa (see below)![image](https://user-images.githubusercontent.com/28574252/144867133-2ce94a82-1fb5-4e56-9edd-8e5f49ec5e29.png)
[ ] It's not clear to me, how to clean up "(!) unwanted" names in final product. All necessary information is in taxon_status field in the source file, but these data are not available in the clearinghouse.
Interpretation of WCVP statuses into CoL statuses.
Total: 1,048,575 names in my Excel spreadsheet (seems, incomplete list imported in the Excel due to its limits)
Field taxon_status: Accepted (308,084) = accepted Synonym (634,237; 901 of them have no parent accepted name, i.e. empty accepted_plant_name_id field ) = synonym (except those 901 names = bare names) Misapplied (947; all with parent accepted name) = misapplied name Othographic (1,574; 3 of them have no parent accepted name) = synonym. Not clear, what to do with those 3 accepted "orthograpic" names: Cassine congonha A.St.-Hil.; Aspidosperma clerceanum Iljin & Krasch.; Croton benzoe L.
Unplaced (47,165; 47,139 of them have no parent accepted name) = (!) all bare names Illegitimate (32,277; 43 of them have no parent accepted name) = synonyms (except those 43 names = (!) bare names) Invalid (22,792; 69 of them have no parent accepted name) = synonyms (except those 69 names = (!) bare names) Local Biotype (100; all with parent accepted name) = synonyms Artificial Hybrid (1,395; 318 of them have rank "genus", 1,072 "species", 2 "variety", 3 blank)