CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

WCVP (id 2232): test report #177

Open yroskov opened 2 years ago

yroskov commented 2 years ago

WCVP, The World Checklist of Vascular Plants of 2021-02-21 was imported on PROD 2021-12-03 by @mdoering.

(see also report on DEV data: https://github.com/CatalogueOfLife/testing/issues/175)

Interpretation of WCVP statuses into CoL statuses.

Total: 1,048,575 names in my Excel spreadsheet (seems, incomplete list imported in the Excel due to its limits)

Field taxon_status: Accepted (308,084) = accepted Synonym (634,237; 901 of them have no parent accepted name, i.e. empty accepted_plant_name_id field ) = synonym (except those 901 names = bare names) Misapplied (947; all with parent accepted name) = misapplied name Othographic (1,574; 3 of them have no parent accepted name) = synonym. Not clear, what to do with those 3 accepted "orthograpic" names: Cassine congonha A.St.-Hil.; Aspidosperma clerceanum Iljin & Krasch.; Croton benzoe L.

Unplaced (47,165; 47,139 of them have no parent accepted name) = (!) all bare names Illegitimate (32,277; 43 of them have no parent accepted name) = synonyms (except those 43 names = (!) bare names) Invalid (22,792; 69 of them have no parent accepted name) = synonyms (except those 69 names = (!) bare names) Local Biotype (100; all with parent accepted name) = synonyms Artificial Hybrid (1,395; 318 of them have rank "genus", 1,072 "species", 2 "variety", 3 blank)

yroskov commented 2 years ago

For attention of @mdoering:

(Unfortunately, we cannot go ahead with WCVP until this problem will be fixed).

yroskov commented 2 years ago

ASSEMBLY

2021-12-22 Sector Acorales - Acoraceae established for the purpose of accessibility of Workbench tool (available inside the Project only).

mdoering commented 2 years ago

Interpretation of WCVP statuses into CoL statuses.

Total: 1,048,575 names in my Excel spreadsheet (seems, incomplete list imported in the Excel due to its limits)

Accepted (308,084) = accepted Synonym (634,237; 901 of them have no parent accepted name) = synonym (except those 901 names = bare names) Misapplied (947; all with parent accepted name) = synonyms Othographic (1,574; 3 of them have no parent accepted name) = synonym. Not clear, what to do with those 3 accepted "orthograpic" names: Cassine congonha A.St.-Hil.; Aspidosperma clerceanum Iljin & Krasch.; Croton benzoe L.

Unplaced (47,165; 47,139 of them have no parent accepted name) = (!) all bare names Illegitimate (32,277; 43 of them have no parent accepted name) = synonyms (except those 43 names = (!) bare names) Invalid (22,792; 69 of them have no parent accepted name) = synonyms (except those 69 names = (!) bare names) Local Biotype (100; all with parent accepted name) = synonyms Artificial Hybrid (1,395; 318 of them have rank "genus", 1,072 "species", 2 "variety", 3 blank)

The coldp generator code for WCVP currently maps the following, keeping all others as they are:

Because this mapping is done in the coldp generator the original values do not show up in the verbatim data in CLB, this is a problem for searching on them.

Maybe it is best to keep the data as verbatim as it was and do some of the mapping generically inside the importer? Might not be simple in case of nomenclatural statuses disguised as taxonomic ones.

yroskov commented 2 years ago

WCVP of 2021-02-21, updated version, imported on PROD 2022-02-14.

image

In the source file: -- -- Family Genus my comment
Genus Artificial Hybrid Solanaceae Eriochroma No species in the source
Genus Artificial Hybrid Solanaceae Iozelia No species in the source
Genus Artificial Hybrid Orchidaceae Aberconwayara No species in the source
Genus Artificial Hybrid Orchidaceae Acampodorum No species in the source
Genus Artificial Hybrid Orchidaceae Acampostylis No species in the source
Genus Artificial Hybrid Orchidaceae Acapetalum No species in the source
Genus Artificial Hybrid Orchidaceae Acemannia No species in the source
Genus Artificial Hybrid Orchidaceae Aceraherminium No species in the source
mdoering commented 2 years ago

Local Biotype (100; all with parent accepted name) = synonyms Artificial Hybrid (1,395; 318 of them have rank "genus", 1,072 "species", 2 "variety", 3 blank)

I don't know what those 2 status values exactly mean. Maybe we can remove all of these records? Would be simple to do in the ColDP generator, or at least only create bare names for them. To do with editorial decisions we would first need to have the verbatim status search feature implemented in both backend and frontend.

yroskov commented 2 years ago

Don't worry. Names from the root are not going to the CoL, because they are out of sectors (families in the case of WCVP). Artificial hybrids have no sense for the CoL.

Names with "Local Biotype" status, if they are synonyms, are fine for the CoL. However, if they will be not present in CLB, it's also fine for me.

yroskov commented 2 years ago

Sectors re-assembled, 2022-03-25:

Family Previous GSD
Order Acorales WCSP
Family Acoraceae WCSP
   
Order Alismatales WCSP
Family Alismataceae WCSP
Family Aponogetonaceae WCSP
Family Araceae WCSP
Family Butomaceae WCSP
Family Cymodoceaceae WCSP
Family Hydrocharitaceae WCSP
Family Juncaginaceae WCSP
Family Maundiaceae WCSP
Family Posidoniaceae WCSP
Family Potamogetonaceae WCSP
Family Ruppiaceae WCSP
Family Scheuchzeriaceae WCSP
Family Tofieldiaceae WCSP
Family Zosteraceae WCSP
   
Order Arecales WCSP
Family Arecaceae WCSP
Family Dasypogonaceae WCSP
   
Order Asparagales WCSP
Family Amaryllidaceae WCSP
Family Asparagaceae WCSP
Family Asphodelaceae WCSP
Family Asteliaceae WCSP
Family Blandfordiaceae WCSP
Family Boryaceae WCSP
Family Doryanthaceae WCSP
Family Hypoxidaceae WCSP
Family Iridaceae WCSP
Family Ixioliriaceae WCSP
Family Lanariaceae WCSP
Family Orchidaceae WCSP
Family Tecophilaeaceae WCSP
Family Xeronemataceae WCSP
   
Order Commelinales WCSP
Family Commelinaceae WCSP
Family Haemodoraceae WCSP
Family Hanguanaceae WCSP
Family Philydraceae WCSP
Family Pontederiaceae WCSP
yroskov commented 2 years ago

Sectors re-assembled, 2022-03-29:

Family Previous GSD
Order Dioscoreales WCSP
Family Burmanniaceae WCSP
Family Dioscoreaceae WCSP
Family Nartheciaceae WCSP
   
Order Liliales WCSP
Family Alstroemeriaceae WCSP
Family Campynemataceae WCSP
Family Colchicaceae WCSP
Family Corsiaceae WCSP
Family Liliaceae WCSP
Family Melanthiaceae WCSP
Family Petermanniaceae WCSP
Family Philesiaceae WCSP
Family Ripogonaceae WCSP
Family Smilacaceae WCSP
   
Order Pandanales WCSP
Family Cyclanthaceae WCSP
Family Pandanaceae WCSP
Family Stemonaceae WCSP
Family Triuridaceae WCSP
Family Velloziaceae WCSP
   
Order Petrosaviales WCSP
Family Petrosaviaceae WCSP
   
Order Poales WCSP
Family Bromeliaceae WCSP
Family Cyperaceae WCSP
Family Ecdeiocoleaceae WCSP
Family Eriocaulaceae WCSP
Family Flagellariaceae WCSP
Family Joinvilleaceae WCSP
Family Mayacaceae WCSP
Family Poaceae WCSP
Family Rapateaceae WCSP
Family Restionaceae WCSP
Family Thurniaceae WCSP
Family Typhaceae WCSP
Family Xyridaceae WCSP
   
Order Zingiberales WCSP
Family Cannaceae WCSP
Family Costaceae WCSP
Family Heliconiaceae WCSP
Family Lowiaceae WCSP
Family Marantaceae WCSP
Family Musaceae WCSP
Family Strelitziaceae WCSP
Family Zingiberaceae WCSP

Sectors of 2022-03-25&29 synced

yroskov commented 2 years ago

Sectors re-assembled, 2022-03-30:

Family Previous GSD
Order Apiales  
Family Araliaceae WCSP
Family Myodocarpaceae WCSP
   
Order Cardiopteridales, newly established  
Family Cardiopteridaceae WCSP
Family Stemonuraceae World Plants
   
Order Asterales  
Family Campanulaceae WCSP
   
Order Cornales  
Family Cornaceae WCSP
Family Nyssaceae WCSP
   
Order Ericales  
Family Clethraceae WCSP
Family Ebenaceae WCSP
Family Lecythidaceae WCSP
Family Sapotaceae WCSP
   
Order Fagales  
Family Betulaceae WCSP
Family Fagaceae WCSP
Family Nothofagaceae WCSP
Family Ticodendraceae WCSP
   
Order Garryales WCSP
Family Eucommiaceae WCSP
Family Garryaceae WCSP
   
Order Gentianales  
Family Rubiaceae WCSP
   
Order Huerteales  
Family Petenaeaceae WCSP
   
Order Lamiales  
Family Lamiaceae WCSP
Family Oleaceae WCSP
Family Schlegeliaceae WCSP
Family Verbenaceae WCSP
   
Order Magnoliales  
Family Magnoliaceae WCSP
   
Order Malpighiales  
Family Caryocaraceae WCSP
Family Centroplacaceae WCSP
Family Euphorbiaceae WCSP
Family Pandaceae WCSP
Family Peraceae WCSP
Family Phyllanthaceae WCSP
Family Picrodendraceae WCSP
Family Putranjivaceae WCSP
   
Order Malvales  
Family Sarcolaenaceae WCSP
Family Sphaerosepalaceae WCSP
   
Order Santalales  
Family Opiliaceae WCSP

Sectors of 2022-03-30 synced

yroskov commented 2 years ago

Checks in CoL-Preview 2022-04-01:

FIXED for all cases.

(Choice of first or second item resulted with Error code 500, message "There was an error processing your request. It has been logged (ID 1401e196cbc64eb9).")

image

In CLB: https://www.checklistbank.org/dataset/2232/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&q=acorus%20calamus&sortBy=taxonomic&status=accepted image

Strange appearance of children subspecies in the source data: https://www.checklistbank.org/catalogue/3/assembly?datasetKey=2232&sourceTaxonKey=2295-wcs image

yroskov commented 2 years ago

https://github.com/CatalogueOfLife/testing/issues/189#issuecomment-1089344847

Sync all WCVP sectors 2022-04-05

yroskov commented 2 years ago

Tests of 2022-04-07:

ISSUES assessed 2022-04-07 image

yroskov commented 2 years ago

TASKS 2022-04-07 NB! It's whole dataset.

image

2022-04-07 ACC-ACC diff. auth. Only 3 names from WCVP sectors resolved.

2022-04-08, not completed yet: image

2022-04-20 resolved as: image

Remains unresolved SYN-SYN species & infra (same acc, same auth), because comment portions (nom. nud., nom. illegit., etc) removed from the authorstrings.

yroskov commented 2 years ago

Synced 2022-04-20

yroskov commented 2 years ago

Tests in CoL of 2022-04-26 on PREVIEW

Some names from those 901 (from WCVP families taken in CoL) appear in production. They present in search box, but gives error when chosen.

Example: Pristidia divaricata (Rubiaceae)

Search: image

Result: image

Aralia lyallii var. robusta (Araliaceae)

Search: image

Result: image

Aralia polaris (Araliaceae)

Search: image

Result: image

yroskov commented 1 year ago

WCVP of 2022-10-27

Family Sector status 2022-11-21 2022-11-28 (after Markus' re-match)
Order Acorales ok
Family Acoraceae +
   
Order Alismatales ok
Family Alismataceae +
Family Aponogetonaceae +
Family Araceae +
Family Butomaceae +
Family Cymodoceaceae +
Family Hydrocharitaceae +
Family Juncaginaceae +
Family Maundiaceae +
Family Posidoniaceae +
Family Potamogetonaceae +
Family Ruppiaceae +
Family Scheuchzeriaceae +
Family Tofieldiaceae +
Family Zosteraceae +
   
Order Arecales ok
Family Arecaceae +
Family Dasypogonaceae +
   
Order Asparagales fixed
Family Amaryllidaceae +
Family Asparagaceae +
Family Asphodelaceae +
Family Asteliaceae +
Family Blandfordiaceae +
Family Boryaceae +
Family Doryanthaceae +
Family Hypoxidaceae +
Family Iridaceae +
Family Ixioliriaceae +
Family Lanariaceae +
Family Orchidaceae +
Family Tecophilaeaceae +
Family Xeronemataceae image
   
Order Commelinales ok
Family Commelinaceae +
Family Haemodoraceae +
Family Hanguanaceae +
Family Philydraceae +
Family Pontederiaceae +
Order Dioscoreales ok
Family Burmanniaceae +
Family Dioscoreaceae +
Family Nartheciaceae +
   
Order Liliales ok
Family Alstroemeriaceae +
Family Campynemataceae +
Family Colchicaceae +
Family Corsiaceae +
Family Liliaceae +
Family Melanthiaceae +
Family Petermanniaceae +
Family Philesiaceae +
Family Ripogonaceae +
Family Smilacaceae +
   
Order Pandanales ok
Family Cyclanthaceae +
Family Pandanaceae +
Family Stemonaceae +
Family Triuridaceae +
Family Velloziaceae +
Order Petrosaviales ok
Family Petrosaviaceae +
Order Poales ok
Family Bromeliaceae +
Family Cyperaceae +
Family Ecdeiocoleaceae +
Family Eriocaulaceae +
Family Flagellariaceae +
Family Joinvilleaceae +
Family Mayacaceae +
Family Poaceae +
Family Rapateaceae +
Family Restionaceae +
Family Thurniaceae +
Family Typhaceae +
Family Xyridaceae +
Order Zingiberales ok
Family Cannaceae +
Family Costaceae +
Family Heliconiaceae +
Family Lowiaceae +
Family Marantaceae +
Family Musaceae +
Family Strelitziaceae +
Family Zingiberaceae +
Order Apiales ok
Family Araliaceae +
Family Myodocarpaceae +
Order Cardiopteridales ok
Family Cardiopteridaceae +
Family Stemonuraceae +
Order Asterales ok
Family Campanulaceae +
Order Cornales fixed
Family Cornaceae +
Family Nyssaceae missing
Order Ericales ok
Family Clethraceae +
Family Ebenaceae +
Family Lecythidaceae +
Family Sapotaceae +
Order Fagales ok
Family Betulaceae +
Family Fagaceae +
Family Nothofagaceae +
Family Ticodendraceae +
Order Garryales ok
Family Eucommiaceae +
Family Garryaceae +
Order Gentianales ok
Family Rubiaceae +
Order Huerteales ok
Family Petenaeaceae +
Order Lamiales ok
Family Lamiaceae +
Family Oleaceae +
Family Schlegeliaceae +
Family Verbenaceae +
Order Magnoliales ok
Family Magnoliaceae +
Order Malpighiales ok
Family Caryocaraceae +
Family Centroplacaceae +
Family Euphorbiaceae +
Family Pandaceae +
Family Peraceae +
Family Phyllanthaceae +
Family Picrodendraceae +
Family Putranjivaceae +
Order Malvales ok
Family Sarcolaenaceae +
Family Sphaerosepalaceae +
Order Santalales ok
Family Opiliaceae +

Synced with resolved sectors and un-resolved Issues & Tasks 2022-11-28

yroskov commented 1 year ago

WCVP of 2022-10-27 (continue)

TASKS

image

mdoering commented 1 year ago

I assume they have entirely different ids? Looking at the first WCVP decision about Pyrus communis f. briggsii:

"subject": {
        "id": "1010083-az",
        "name": "Pyrus communis f. briggsii",
        "authorship": "Syme",
        "rank": "form",
        "status": "synonym",
        "parent": "Pyrus",
        "broken": true,
        "label": "Pyrus communis f. briggsii Syme",
        "labelHtml": "<i>Pyrus communis</i> f. <i>briggsii</i> Syme"
    },

The parent=Pyrus property seems wrong. For this synonym I suppose this should have been one of the accepted names. @thomasstjerne know how parent in a decision gets populated?

The ids have definitely changed. The above is either 2963149 or 3010083 now: https://www.checklistbank.org/catalogue/3/decision?limit=100&offset=0&subjectDatasetKey=2232

Rematching fails. I cant say for 100%, but I suspect because the parent is set wrongly to the genus.

yroskov commented 1 year ago

WCVP of 2022-10-27 (continue)

ISSUES

image

yroskov commented 1 year ago

TASKS image

Genus Author Family Decision
Bessera   Amaryllidaceae prov acc
Bessera Schult.f. Asparagaceae  
Colobogynium   Costaceae prov acc
Colobogynium Schott Araceae  
Disporum   Asparagaceae prov acc
Disporum Salisb. Colchicaceae  
Ipomoea   Poaceae prov acc
Ipomoea L. Convolvulaceae  
Tricyrtis   Asparagaceae prov acc
Tricyrtis Wall. Liliaceae  
x Hueylihara Anon. Orchidaceae  
x Hueylihara Garay Orchidaceae prov acc
x Jesupara Glic. Orchidaceae prov acc: Orchid Rev. 126(1322, Suppl.): 43 (2018)
x Jesupara Glic. Orchidaceae  
x Nakamotoara Anon. Orchidaceae  
x Nakamotoara Garay Orchidaceae prov acc
x Rumrillara Garay Orchidaceae prov acc
x Rumrillara Anon. Orchidaceae  
x Scullyara Anon. Orchidaceae  
x Scullyara Garay & H.R.Sweet Orchidaceae prov acc
x Smithara Garay Orchidaceae prov acc
x Smithara Garay & H.R.Sweet Orchidaceae  
x Trichopsis Anon. Orchidaceae  
x Trichopsis Y.Itô Cactaceae prov acc

image

Synced 2022-12-15

yroskov commented 3 weeks ago

WCVP 10.0 / 2022-10-27 (the same as above)

TASKS resolved 2024-06-07:

image

yroskov commented 3 weeks ago

WCVP ver. 13.0 / 2024-05-16; imported 2024-06-08

Metrics

image

TASKS

image

Resolved 2024-06-11:

image

Synced 2024-06-11

mdoering commented 3 weeks ago

The EML metadata file has wrongly encoded characters. Nothing I can do, I informed Kew about it an cc#ed you.

yroskov commented 3 weeks ago

I have resolved Tasks. @mdoering, May I re-sync all WCVP sectors again now?

mdoering commented 3 weeks ago

sure!