CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

Towards 2023 Annual Checklist #217

Open yroskov opened 1 year ago

yroskov commented 1 year ago

Delivery date: June 2023

@olafbanki, Ed, @dhobern, you are very welcome to add milestones & targets for 2023 Annual Checklist (i.e. "must-to-do" GSDs (both new & old), changes in the classification, etc.)

yroskov commented 1 year ago

TASKS 2023-05-01

image

yroskov commented 1 year ago

ACC-ACC sp, same auth, 16 duplicates

scientificName Authorship Status GSD Results
†Alekhosara reticulata (paleo) Aristov, 2008 acc sp SF Orthoptera
†Alekhosara reticulata (paleo) Aristov, 2008 acc sp SF Grylloblattodea
Clonia (Clonia) zernyi Kaltenbach, 1971 acc sp SF Orthoptera
Clonia (Clonia) zernyi Kaltenbach, 1971 acc sp SF Orthoptera blocked
†Dermomurex kilikiensis Landau, Harzhauser, İslamoğlu & Silva, 2013 acc sp WoRMS Mollusca
†Dermomurex kilikiensis Landau, Harzhauser, İslamoğlu & Silva, 2013 acc sp WoRMS Mollusca
intermedia acc sp WoRMS Ostracoda Beyrichioidea > Beyrichiidae > Beyrichia
intermedia acc sp WoRMS Ostracoda Platycopida > Kloedenellidae > Kloedenella
koslowi acc sp WoRMS Turbellarians Planariidae > Polycelis > Polycelis
koslowi acc sp WoRMS Turbellarians Planariidae > Polycelis > Polycelidia
Lipsothrix jiri Podenas, 2020 acc sp Systema Dipterorum
Lipsothrix jiri Podenas, 2020 acc sp CCW
†Necrocarcinus rathbunae Roberts, 1962 acc sp WoRMS Brachyura broken import
†Necrocarcinus rathbunae Roberts, 1962 acc sp WoRMS Brachyura broken import
Ovipoculum album Zhu L. Yang & R. Kirschner acc sp Species Fungorum Plus - Oomycota old SF+; unfixable
Ovipoculum album Zhu L. Yang & R. Kirschner acc sp Species Fungorum Plus - Basidiomycota current SF+; unfixable
†Palaeomesorthopteron pullus Aristov, Grauvogel-Stamm & Marchal-Papier, 2011 acc sp SF Embioptera
†Palaeomesorthopteron pullus Aristov, Grauvogel-Stamm & Marchal-Papier, 2011 acc sp SF Grylloblattodea
†Paranecrocarcinus libanoticus Förster, 1968 acc sp WoRMS Brachyura broken import
†Paranecrocarcinus libanoticus Förster, 1968 acc sp WoRMS Brachyura broken import
†Propontocypris dromas Aiello, Barra & Bonaduce, 2000 acc sp WoRMS Ostracoda
†Propontocypris dromas Aiello, Barra & Bonaduce, 2000 acc sp WoRMS Ostracoda
†Protonecrocarcinus ovalis (Stenzel, 1945) acc sp WoRMS Brachyura broken import
†Protonecrocarcinus ovalis (Stenzel, 1945) acc sp WoRMS Brachyura broken import
†Pseudonecrocarcinus gamma (Roberts, 1962) acc sp WoRMS Brachyura broken import
†Pseudonecrocarcinus gamma (Roberts, 1962) acc sp WoRMS Brachyura broken import
†Pseudonecrocarcinus quadriscissus (Noetling, 1881) acc sp WoRMS Brachyura broken import
†Pseudonecrocarcinus quadriscissus (Noetling, 1881) acc sp WoRMS Brachyura broken import
Shirakiacris yunkweiensis (Chang, 1937) acc sp SF Orthoptera
Shirakiacris yunkweiensis (Chang, 1937) acc sp SF Orthoptera blocked
Steinernema australe Edgington, Buddie, Tymo, Hunt, Nguyen, France, Merino & Moore, 2009 acc sp WoRMS Nematoda
Steinernema australe Edgington, Buddie, Tymo, Hunt, Nguyen, France, Merino & Moore, 2009 acc sp WoRMS Nematoda
yroskov commented 1 year ago

https://github.com/CatalogueOfLife/data/milestone/4

yroskov commented 1 year ago

https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&limit=100&minSize=2&mode=STRICT&offset=0&rank=family&rank=class&rank=order&rank=phylum&rank=suborder&rank=infraorder&rank=superfamily&rank=subfamily&rank=suprageneric%20name&rank=superorder&rank=subclass&rank=superclass&rank=subphylum&rankDifferent=false&status=accepted&withDecision=false

There is no CLB tool to resolve these issues! = Now we have it! (don't forget to sync GSDs, where decisions made).

dhobern commented 1 year ago

@yroskov - do you have any suggestion what such a tool would look like? From a quick scan, I can see that a few relate to different treatment in two datasets, but others (like the Systema Dipterorum ones) seem to be the result of trying to normalise denormalised datasets which may not have used their columns in exactly the way we expect. See the Ceratopogoninae example. It would be valuable to review these and communicate with the dataset holders or with whoever wrote the script to scrape them.

yroskov commented 1 year ago

do you have any suggestion what such a tool would look like?

Yes, I have. CoL@CLB needs two tools (both are similar to TASKS manager with minor improvements; TASKS reports on duplicates across GSDs should include GSD name and full classification (preferably with ranks)): (1) TASKS for the GSD vs whole CoL (i.e. all CoL sectors) (2) TASKS for the project (i.e. reports on duplicates for whole CoL). A mock-up of this tool is already available inside the project menu, but it is not functional - decisions are not applicable, export CSV file does not have indication of source GSD.

yroskov commented 1 year ago

others (like the Systema Dipterorum ones) seem to be the result of trying to normalise denormalised datasets which may not have used their columns in exactly the way we expect. See the Ceratopogoninae example. It would be valuable to review these and communicate with the dataset holders or with whoever wrote the script to scrape

I can see the only way for Systema Dipterorum, it needs better software for data management and an international team of proactive editors, who may take curatorial responsibilities over data. Plus, it would be nice to keep taxonomically unresolved data outside taxonomic checklist released to public.

@gdower, would you please introduce @dhobern to Systema Dipterorum export file and our interactions with SD? (We discussed our vision for SD best opportunities many times and have a consensus).

And general comment: attempts to fix problems with "internal data integrity" on CoL side cannot be much successful if we do regular updates. The problems need to be cleaned up on GSD side.

yroskov commented 1 year ago

GSD updates in June:

DH, 2023-05-18 >I've updated Global Lepidoptera Index and Pterophoroidea this month

DH, 2023-05-18 > The Sesiidae dataset is now public and approved for use to replace the corresponding part of the Global Lepidoptera Index dataset

Re-synced: 2023-05-19 - Tortricid.net, ver. 4.0 of 2018-12-31

Adriano, 2023-05-24 > In May we made lots of important and numerous changes in our database. ...it would be best if you could produce another version soon enough.

yroskov commented 1 year ago
yroskov commented 1 year ago
yroskov commented 1 year ago

Release Alias Template changed as COL{date,yy} (see https://github.com/CatalogueOfLife/testing/issues/201#issuecomment-1209435378)

PREVIEW release started 2023-05-31, 3:17 pm (server time) Finished as COL23, 2023-05-31, id 9895, 4:40 pm Deployed to the preview website 2023-05-31.

CHECKS:

Release Version Template CHANGED as Annual Checklist 2023 image

yroskov commented 1 year ago

PREVIEW release started 2023-05-31, 5:51 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9896, 7:15 pm Deployed to the preview website 2023-05-31.

yroskov commented 1 year ago

IRMNG = CoL uses version Mar 2018 / 2018-03-20, but not a version 2023-05-19 / 2023-05-19 as it is in CLB now PaleoBioDB = CoL uses version Feb 2018 / 2018-02-16, but not a version 2022-03-01 / 2022-03-01 as it is in CLB now Species Fungorum Plus = CoL uses version Jan 2023 / 2023-01-17, where taxa Bigyra, Cercozoa & Oomycota are not present (they were preserved from version Feb 2020 / 2020-02-14)

yroskov commented 1 year ago

@gdower, taken in account what @mdoering says about sectors management in CLB here , how WoRMS pipelines dealing with broken sectors?

Did we have so many broken sectors in the imports of previous months? (My recollection, no).

2023-06-06, there are 2 broken sectors in WoRMS today: WoRMS Tantulocarida & WoRMS Brachypoda (https://github.com/CatalogueOfLife/testing/issues/227#issuecomment-1577372794)

= FIXED 2023-06-06 (WoRMS Tantulocarida synced 2023-06-06)

yroskov commented 1 year ago

A new problem (2023-06-05): incorrect behaviour of decision maker buttons in ACC-ACC species (https://github.com/CatalogueOfLife/checklistbank/issues/1243).

yroskov commented 1 year ago
yroskov commented 1 year ago

PREVIEW release started 2023-06-07, 5:26 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9898 Deployed to the preview website 2023-06-08

CHECKS

Problem with sectors: image

Sectors report: image

yroskov commented 1 year ago

PREVIEW release started 2023-06-08, 7:11 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9899 Deployed to the preview website 2023-06-08

yroskov commented 1 year ago
yroskov commented 1 year ago

PREVIEW release started 2023-06-12, 8:26 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9900, 9:51 pm

yroskov commented 1 year ago

See also https://github.com/CatalogueOfLife/data/issues/527

yroskov commented 1 year ago

PREVIEW release started 2023-06-12, 10:54 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9901, 2023-06-13 12:13 am Deployed to the preview website 2023-06-13

yroskov commented 1 year ago
yroskov commented 1 year ago

https://github.com/CatalogueOfLife/testing/issues/8#issuecomment-1587804117

yroskov commented 1 year ago

PREVIEW release started 2023-06-13, 8:16 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9902, 2023-06-13 9.36 pm Deployed to the preview website 2023-06-13

yroskov commented 1 year ago

PREVIEW release started 2023-06-14, 7:36 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9903, 2023-06-14 8:56 pm Deployed to the preview website 2023-06-14

yroskov commented 1 year ago

PREVIEW release started 2023-06-15, 2:47 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9904, 2023-06-15 4:07 pm Deployed to the preview website 2023-06-15

yroskov commented 1 year ago

PREVIEW release started 2023-06-17, 11:03 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9905, 2023-06-18 Deployed to the preview website 2023-06-22

yroskov commented 1 year ago

@olafbanki, do you have a final decision on replacement of Siphonaptera and continuation with all SFs checklists in AC23? Have you updated AC23 metadata as you wish?

We have two days on this week to finalize remaining issues.

If all are fine from your point of view, the checklist with id 9905 of June 18th at https://www.checklistbank.org/dataset/9905 is ready as 2023 Annual Checklist.

yroskov commented 1 year ago

@olafbanki, 2023-06-22:

On SF, it is clear that at least for now SF Cockroach cannot be used by COL because of a license that is not supported. It needs to be removed from the COL Checklist, but can live in ChecklistBank for now.

PREVIEW release started 2023-06-22, 5:20 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9905, 2023-06-22, 6:41 pm

yroskov commented 1 year ago

@olafbanki, 2023-06-22:

I am still waiting for Matt's guidance on SF Coreoidea and SF Mantodea. Phoronida, Emig, might request a license that is more restrictive than what seems to be common scientific practice. Discussion is still ongoing, but possibly it has to be removed. For next months we might consider a replacement from WoRMS.

It looks like we are heading to publication of AC 2023 on either next Tuesday or Wednesday. [2023-06-27 -28]

yroskov commented 1 year ago

PREVIEW release started 2023-06-22, 7:38 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9907, 2023-06-22, 8:58 pm Deployed to the preview website 2023-06-23

yroskov commented 1 year ago

PREVIEW release started 2023-06-23, 5:49 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9908, 2023-06-23

yroskov commented 1 year ago

PREVIEW release started 2023-06-26, 9:45 pm (server time) = 23.45 Leiden Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9909, 2023-06-26, 11:04 pm Deployed to the preview website 2023-06-27

yroskov commented 1 year ago

PREVIEW release started 2023-06-27, 8:38 pm (server time) = 22.38 Leiden

olafbanki commented 1 year ago

Licenses changed to CC-BY from the following GSDs:

Cilcat https://www.checklistbank.org/dataset/1113/about, Fada Rotifera https://www.checklistbank.org/dataset/1047/about, Globis https://www.checklistbank.org/dataset/1046/about, Mites GSD Phytoseiidae https://www.checklistbank.org/dataset/1070/about Mites GSD Tenuipalpidae https://www.checklistbank.org/dataset/1078/about PBI Plant Bug https://www.checklistbank.org/dataset/1171/about Phoronida https://www.checklistbank.org/dataset/1104/about Taxapad https://www.checklistbank.org/dataset/1068/about Tineidae https://www.checklistbank.org/dataset/1031/about Zoological-Botanical Database (Vespoidea) https://www.checklistbank.org/dataset/1037/about

Licenses changed in the UI of CLB, and metadata locked

olafbanki commented 1 year ago

CC-BY license added to the COL Checklist

olafbanki commented 1 year ago

Metadata of creators of the COL Checklist changed to also include: Diana Hernández, Camila Plata, Thomas Jeppesen, and Ari Örn; Dave Remsen removed.

COL Checklist 2023 dedicated to the memory of David Remsen

yroskov commented 1 year ago

PREVIEW release started 2023-06-27, 8:38 pm (server time) = 22.38 Leiden Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9910, 2023-06-27, 9:58 pm Deployed to the preview website 2023-06-27

@olafbanki, I have checked results of this release at https://preview.catalogueoflife.org. The checklist content is fine and consistent with few previous drafts. Adjustments to AC23 metadata are in place. AC23 is ready to be published on the portal.

olafbanki commented 1 year ago

@yroskov I have made an accompanying blog post and that also looks good in preview. I will now proceed with the publication of AC 2023.

olafbanki commented 1 year ago

The AC2023 is published.