Open yroskov opened 1 year ago
TASKS 2023-05-01
ACC-ACC sp, same auth, 16 duplicates
scientificName | Authorship | Status | GSD | Results |
---|---|---|---|---|
†Alekhosara reticulata (paleo) | Aristov, 2008 | acc sp | SF Orthoptera | |
†Alekhosara reticulata (paleo) | Aristov, 2008 | acc sp | SF Grylloblattodea | |
Clonia (Clonia) zernyi | Kaltenbach, 1971 | acc sp | SF Orthoptera | |
Clonia (Clonia) zernyi | Kaltenbach, 1971 | acc sp | SF Orthoptera | blocked |
†Dermomurex kilikiensis | Landau, Harzhauser, İslamoğlu & Silva, 2013 | acc sp | WoRMS Mollusca | |
†Dermomurex kilikiensis | Landau, Harzhauser, İslamoğlu & Silva, 2013 | acc sp | WoRMS Mollusca | |
intermedia | acc sp | WoRMS Ostracoda Beyrichioidea > Beyrichiidae > Beyrichia | ||
intermedia | acc sp | WoRMS Ostracoda Platycopida > Kloedenellidae > Kloedenella | ||
koslowi | acc sp | WoRMS Turbellarians Planariidae > Polycelis > Polycelis | ||
koslowi | acc sp | WoRMS Turbellarians Planariidae > Polycelis > Polycelidia | ||
Lipsothrix jiri | Podenas, 2020 | acc sp | Systema Dipterorum | |
Lipsothrix jiri | Podenas, 2020 | acc sp | CCW | |
†Necrocarcinus rathbunae | Roberts, 1962 | acc sp | WoRMS Brachyura | broken import |
†Necrocarcinus rathbunae | Roberts, 1962 | acc sp | WoRMS Brachyura | broken import |
Ovipoculum album | Zhu L. Yang & R. Kirschner | acc sp | Species Fungorum Plus - Oomycota | old SF+; unfixable |
Ovipoculum album | Zhu L. Yang & R. Kirschner | acc sp | Species Fungorum Plus - Basidiomycota | current SF+; unfixable |
†Palaeomesorthopteron pullus | Aristov, Grauvogel-Stamm & Marchal-Papier, 2011 | acc sp | SF Embioptera | |
†Palaeomesorthopteron pullus | Aristov, Grauvogel-Stamm & Marchal-Papier, 2011 | acc sp | SF Grylloblattodea | |
†Paranecrocarcinus libanoticus | Förster, 1968 | acc sp | WoRMS Brachyura | broken import |
†Paranecrocarcinus libanoticus | Förster, 1968 | acc sp | WoRMS Brachyura | broken import |
†Propontocypris dromas | Aiello, Barra & Bonaduce, 2000 | acc sp | WoRMS Ostracoda | |
†Propontocypris dromas | Aiello, Barra & Bonaduce, 2000 | acc sp | WoRMS Ostracoda | |
†Protonecrocarcinus ovalis | (Stenzel, 1945) | acc sp | WoRMS Brachyura | broken import |
†Protonecrocarcinus ovalis | (Stenzel, 1945) | acc sp | WoRMS Brachyura | broken import |
†Pseudonecrocarcinus gamma | (Roberts, 1962) | acc sp | WoRMS Brachyura | broken import |
†Pseudonecrocarcinus gamma | (Roberts, 1962) | acc sp | WoRMS Brachyura | broken import |
†Pseudonecrocarcinus quadriscissus | (Noetling, 1881) | acc sp | WoRMS Brachyura | broken import |
†Pseudonecrocarcinus quadriscissus | (Noetling, 1881) | acc sp | WoRMS Brachyura | broken import |
Shirakiacris yunkweiensis | (Chang, 1937) | acc sp | SF Orthoptera | |
Shirakiacris yunkweiensis | (Chang, 1937) | acc sp | SF Orthoptera | blocked |
Steinernema australe | Edgington, Buddie, Tymo, Hunt, Nguyen, France, Merino & Moore, 2009 | acc sp | WoRMS Nematoda | |
Steinernema australe | Edgington, Buddie, Tymo, Hunt, Nguyen, France, Merino & Moore, 2009 | acc sp | WoRMS Nematoda |
There is no CLB tool to resolve these issues! = Now we have it! (don't forget to sync GSDs, where decisions made).
Number of remaining unresolved duplicates dropped to 98, 2023-05-12 (mainly, internal duplicates in Systema Dipterorum). CilCat Lagynidae (prov. acc.) vs WoRMS Foraminifera Lagynidae Schultze, 1854 = I am afraid to sync CilCat of ac19, because there is no gurantee that data will not be corrupted.
Number of remaining unresolved duplicates increased to 152, 2023-06-06 after updates in WoRMS (majority of internal duplicates in Systema Dipterorum, WoRMS Ostracoda (!), Bryozoa (6), Trematoda (1), Turbellarians (1) (also need to sync MilliBase, Holothuroidea = synced all listed here WoRMS 2023-06-06)
Number of remaining unresolved duplicates dropped to 98, 2023-06-06 after resolved internal duplicates in WoRMS
Number of remaining unresolved duplicates dropped to 6 (3 pairs), 2023-06-12 after resolved duplicates in Systema Dipterorum 4.2.2.
@yroskov - do you have any suggestion what such a tool would look like? From a quick scan, I can see that a few relate to different treatment in two datasets, but others (like the Systema Dipterorum ones) seem to be the result of trying to normalise denormalised datasets which may not have used their columns in exactly the way we expect. See the Ceratopogoninae example. It would be valuable to review these and communicate with the dataset holders or with whoever wrote the script to scrape them.
do you have any suggestion what such a tool would look like?
Yes, I have. CoL@CLB needs two tools (both are similar to TASKS manager with minor improvements; TASKS reports on duplicates across GSDs should include GSD name and full classification (preferably with ranks)): (1) TASKS for the GSD vs whole CoL (i.e. all CoL sectors) (2) TASKS for the project (i.e. reports on duplicates for whole CoL). A mock-up of this tool is already available inside the project menu, but it is not functional - decisions are not applicable, export CSV file does not have indication of source GSD.
others (like the Systema Dipterorum ones) seem to be the result of trying to normalise denormalised datasets which may not have used their columns in exactly the way we expect. See the Ceratopogoninae example. It would be valuable to review these and communicate with the dataset holders or with whoever wrote the script to scrape
I can see the only way for Systema Dipterorum, it needs better software for data management and an international team of proactive editors, who may take curatorial responsibilities over data. Plus, it would be nice to keep taxonomically unresolved data outside taxonomic checklist released to public.
@gdower, would you please introduce @dhobern to Systema Dipterorum export file and our interactions with SD? (We discussed our vision for SD best opportunities many times and have a consensus).
And general comment: attempts to fix problems with "internal data integrity" on CoL side cannot be much successful if we do regular updates. The problems need to be cleaned up on GSD side.
GSD updates in June:
[x] WoRMS of 2023-06-01; imported 2023-06-02 (see progress https://github.com/CatalogueOfLife/testing/issues/227)
[x] ITIS of 2023-05-25 imported 2023-06-01; synced 2023-06-05
[x] Scarabs of 2023-05-30; imported 2023-05-30; synced 2023-05-31
DH, 2023-05-18 >I've updated Global Lepidoptera Index and Pterophoroidea this month
[x] GLI 0.32.3 / 2023-05-10; imported 2023-05-10; synced 2023-05-19
[x] Pterophoroidea 1.1.23.136 (16 May 2023); imported 2023-05-16; synced 2023-05-18; re-synced 2023-05-19; re-synced 2023-05-31; re-synced 2023-06-02
[x] Gelechiidae 1.1.23.140 (20 May 2023); imported 2023-05-20; synced 2023-05-22; re-synced 2023-06-02
DH, 2023-05-18 > The Sesiidae dataset is now public and approved for use to replace the corresponding part of the Global Lepidoptera Index dataset
Re-synced: 2023-05-19 - Tortricid.net, ver. 4.0 of 2018-12-31
Adriano, 2023-05-24 > In May we made lots of important and numerous changes in our database. ...it would be best if you could produce another version soon enough.
[x] Opiliones of 2023-05-31; imported 2023-05-31; synced 2023-05-31
[x] Tortricid.net; imported 2023-05-24; synced 2023-05-31
[x] Systema Dipterorum ver. 4.2.2, May 2023; imported 2023-05-30; synced 2023-06-12 === No sync before AC23 release!!!: https://github.com/CatalogueOfLife/testing/issues/127#issuecomment-1593467055
[x] LWS fleas of 2023-06-22; imported 2023-06-22; synced 2023-06-22; re-synced 2023-06-23
Release Alias Template changed as COL{date,yy} (see https://github.com/CatalogueOfLife/testing/issues/201#issuecomment-1209435378)
PREVIEW release started 2023-05-31, 3:17 pm (server time) Finished as COL23, 2023-05-31, id 9895, 4:40 pm Deployed to the preview website 2023-05-31.
CHECKS:
Release Version Template CHANGED as Annual Checklist 2023
PREVIEW release started 2023-05-31, 5:51 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9896, 7:15 pm Deployed to the preview website 2023-05-31.
[ ] 28 sectors are broken 2023-06-01 https://github.com/CatalogueOfLife/data/issues/535#issuecomment-1572513294
[x] Alucitoidea superfamily: Alucitoidea order: Lepidoptera = REMATCHED 2023-06-01 (done via rematch button)
[x] Brassicaceae family: Brassicaceae order: Brassicales = REMATCHED 2023-06-01 (done via rematch button) = Shown as broken again 2023-06-02
[x] Bryonames order: Incertae sedis subkingdom: Bryobiotina = REMATCHED 2023-06-01 (done via rematch button) = Shown as broken again 2023-06-02
[x] Gelechiidae family: Gelechiidae Gelechioidea = REMATCHED 2023-06-01 (done via rematch button); 2023-06-05 (done via rematch button)
[ ] IRMNG class: Bolidophyceae phylum: Ochrophyta
[ ] IRMNG class: Chrysomerophyceae phylum: Ochrophyta
[ ] IRMNG class: Chrysophyceae phylum: Ochrophyta
[ ] IRMNG class: Dictyochophyceae phylum: Ochrophyta
[ ] IRMNG class: Eustigmatophyceae phylum: Ochrophyta
[ ] IRMNG class: Phaeophyceae phylum: Ochrophyta
[ ] IRMNG class: Phaeothamniophyceae phylum: Ochrophyta
[ ] IRMNG class: Picophagophyceae phylum: Ochrophyta
[ ] IRMNG class: Pinguiophyceae phylum: Ochrophyta
[ ] IRMNG class: Raphidophyceae phylum: Ochrophyta
[ ] IRMNG class: Schizocladiophyceae phylum: Ochrophyta
[ ] IRMNG class: Xanthophyceae phylum: Ochrophyta
[ ] IRMNG order: Againococcidiida phylum: Miozoa
[x] Nepticuloidea superfamily: Nepticuloidea order: Lepidoptera = REMATCHED 2023-06-01 (done via rematch button); 2023-06-05 (done via rematch button)
[ ] PaleoBioDB class: Trilobita phylum: Arthropoda
[ ] PaleoBioDB order: Ammonoidea class: Cephalopoda
[ ] PaleoBioDB order: Belemnitida class: Cephalopoda
[ ] Species Fungorum Plus phylum: Bigyra kingdom: Chromista
[ ] Species Fungorum Plus phylum: Cercozoa kingdom: Chromista
[ ] Species Fungorum Plus phylum: Oomycota kingdom: Chromista
[x] Trichomycetes class: Ichthyosporea phylum: Choanozoa = REMATCHED 2023-06-01 (done via rematch button) = Shown as broken again 2023-06-02
[x] Trichomycetes genus: Amoebosporus phylum: Choanozoa = REMATCHED 2023-06-01 (done via rematch button) = Shown as broken again 2023-06-02
[x] WoRMS brachyura section: Eubrachyura infraorder: Brachyura = FIXED as a single sector by somebody 2023-06-02
[x] WoRMS brachyura section: Podotremata infraorder: Brachyura = FIXED as a single sector by somebody 2023-06-02
IRMNG = CoL uses version Mar 2018 / 2018-03-20, but not a version 2023-05-19 / 2023-05-19 as it is in CLB now PaleoBioDB = CoL uses version Feb 2018 / 2018-02-16, but not a version 2022-03-01 / 2022-03-01 as it is in CLB now Species Fungorum Plus = CoL uses version Jan 2023 / 2023-01-17, where taxa Bigyra, Cercozoa & Oomycota are not present (they were preserved from version Feb 2020 / 2020-02-14)
@gdower, taken in account what @mdoering says about sectors management in CLB here , how WoRMS pipelines dealing with broken sectors?
Did we have so many broken sectors in the imports of previous months? (My recollection, no).
2023-06-06, there are 2 broken sectors in WoRMS today: WoRMS Tantulocarida & WoRMS Brachypoda (https://github.com/CatalogueOfLife/testing/issues/227#issuecomment-1577372794)
= FIXED 2023-06-06 (WoRMS Tantulocarida synced 2023-06-06)
A new problem (2023-06-05): incorrect behaviour of decision maker buttons in ACC-ACC species (https://github.com/CatalogueOfLife/checklistbank/issues/1243).
PREVIEW release started 2023-06-07, 5:26 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9898 Deployed to the preview website 2023-06-08
CHECKS
Problem with sectors:
Sectors report:
PREVIEW release started 2023-06-08, 7:11 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9899 Deployed to the preview website 2023-06-08
[x] search for possible overlaps in the classification in whole CoL:
accepted species with identical authorstrings, 0: https://www.checklistbank.org/catalogue/3/duplicates?authorshipDifferent=false&catalogueKey=3&category=binomial&limit=500&rankDifferent=false&status=accepted
accepted species with different authorstrings, 281: https://www.checklistbank.org/catalogue/3/duplicates?authorshipDifferent=true&catalogueKey=3&category=binomial&limit=500&rankDifferent=false&status=accepted.
[x] There is a set of internal duplicates in WoRMS Foraminifera = FIXED https://github.com/CatalogueOfLife/testing/issues/61#issuecomment-1584715819, WoRMS Ostracoda = FIXED https://github.com/CatalogueOfLife/testing/issues/103#issuecomment-1584851125
PREVIEW release started 2023-06-12, 8:26 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9900, 9:51 pm
PREVIEW release started 2023-06-12, 10:54 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9901, 2023-06-13 12:13 am Deployed to the preview website 2023-06-13
https://github.com/CatalogueOfLife/testing/issues/8#issuecomment-1587804117
PREVIEW release started 2023-06-13, 8:16 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9902, 2023-06-13 9.36 pm Deployed to the preview website 2023-06-13
PREVIEW release started 2023-06-14, 7:36 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9903, 2023-06-14 8:56 pm Deployed to the preview website 2023-06-14
PREVIEW release started 2023-06-15, 2:47 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9904, 2023-06-15 4:07 pm Deployed to the preview website 2023-06-15
PREVIEW release started 2023-06-17, 11:03 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9905, 2023-06-18 Deployed to the preview website 2023-06-22
@olafbanki, do you have a final decision on replacement of Siphonaptera and continuation with all SFs checklists in AC23? Have you updated AC23 metadata as you wish?
We have two days on this week to finalize remaining issues.
If all are fine from your point of view, the checklist with id 9905 of June 18th at https://www.checklistbank.org/dataset/9905 is ready as 2023 Annual Checklist.
@olafbanki, 2023-06-22:
On SF, it is clear that at least for now SF Cockroach cannot be used by COL because of a license that is not supported. It needs to be removed from the COL Checklist, but can live in ChecklistBank for now.
PREVIEW release started 2023-06-22, 5:20 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9905, 2023-06-22, 6:41 pm
@olafbanki, 2023-06-22:
I am still waiting for Matt's guidance on SF Coreoidea and SF Mantodea. Phoronida, Emig, might request a license that is more restrictive than what seems to be common scientific practice. Discussion is still ongoing, but possibly it has to be removed. For next months we might consider a replacement from WoRMS.
It looks like we are heading to publication of AC 2023 on either next Tuesday or Wednesday. [2023-06-27 -28]
PREVIEW release started 2023-06-22, 7:38 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9907, 2023-06-22, 8:58 pm Deployed to the preview website 2023-06-23
PREVIEW release started 2023-06-23, 5:49 pm (server time) Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9908, 2023-06-23
PREVIEW release started 2023-06-26, 9:45 pm (server time) = 23.45 Leiden Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9909, 2023-06-26, 11:04 pm Deployed to the preview website 2023-06-27
PREVIEW release started 2023-06-27, 8:38 pm (server time) = 22.38 Leiden
Licenses changed to CC-BY from the following GSDs:
Cilcat https://www.checklistbank.org/dataset/1113/about, Fada Rotifera https://www.checklistbank.org/dataset/1047/about, Globis https://www.checklistbank.org/dataset/1046/about, Mites GSD Phytoseiidae https://www.checklistbank.org/dataset/1070/about Mites GSD Tenuipalpidae https://www.checklistbank.org/dataset/1078/about PBI Plant Bug https://www.checklistbank.org/dataset/1171/about Phoronida https://www.checklistbank.org/dataset/1104/about Taxapad https://www.checklistbank.org/dataset/1068/about Tineidae https://www.checklistbank.org/dataset/1031/about Zoological-Botanical Database (Vespoidea) https://www.checklistbank.org/dataset/1037/about
Licenses changed in the UI of CLB, and metadata locked
CC-BY license added to the COL Checklist
Metadata of creators of the COL Checklist changed to also include: Diana Hernández, Camila Plata, Thomas Jeppesen, and Ari Örn; Dave Remsen removed.
COL Checklist 2023 dedicated to the memory of David Remsen
PREVIEW release started 2023-06-27, 8:38 pm (server time) = 22.38 Leiden Finished as COL23, Catalogue of Life Checklist, Annual Checklist 2023, id 9910, 2023-06-27, 9:58 pm Deployed to the preview website 2023-06-27
@olafbanki, I have checked results of this release at https://preview.catalogueoflife.org. The checklist content is fine and consistent with few previous drafts. Adjustments to AC23 metadata are in place. AC23 is ready to be published on the portal.
@yroskov I have made an accompanying blog post and that also looks good in preview. I will now proceed with the publication of AC 2023.
The AC2023 is published.
Delivery date: June 2023
@olafbanki, Ed, @dhobern, you are very welcome to add milestones & targets for 2023 Annual Checklist (i.e. "must-to-do" GSDs (both new & old), changes in the classification, etc.)