Open yroskov opened 1 month ago
[x] DUPLICATED TAXA of higher ranks
[x] WoRMS of 2024-06-01; imported 2024-06-02; 2024-06-03 - Editors in WoRMS Nemys & MolluscaBase = OK
[ ] ITIS = not for AC24
[x] Scarabs received 2024-06-03; imported 2024-06-03; synced 2024-06-03
FROM TaxonWorks:
monthly started from January 2024:
monthly started from March 2024:
_single update in a year (January):~
~~SF Coleorrhyncha SF Embioptera SF Grylloblattodea SF Mantophasmatodea SF Zoraptera~~
LEPIDOPTERA:
OTHER:
[ ] UCD - ask for logo
[x] Systema Dipterorum ver. 5.2 of 2024-05-15; received 2024-05-16; imported 2024-06-01; synced 2024-06-03
[x] World Ferns ver. 19.3 of 2024-05-18; imported 2024-05-21; synced 2024-05-21
[x] World Plants ver. 19.3 of 2024-05-18 - changes in the classification = OK; imported 2024-05-22; synced 2024-05-22
[x] Eumycetozoa.com received 2024-05-22; imported 2024-06-03; synced 2024-06-04
[x] RWC: Rotifer World Catalogue ver. 1.0 / 2024-06-05 https://www.checklistbank.org/dataset/298081/about; synced 2024-06-06, re-synced 2024-06-11
[x] WCVP ver 13.0 / 2024-05-16, imported 2024-06-08; synced 2024-06-11
[ ] WCVP-Fabaceae 2024v.5 / 2024-05-16, imported 2024-06-05
=============================
SF Permopsocida is not yet in CoL; I have submitted its adoption to the CoL Taxon Group. (see email of 2023-11-10, Heidi)
ScaleNet
WCVP ver 11.0 / 2023-04-20
LPSN see https://github.com/CatalogueOfLife/data/issues/202#issuecomment-1970787286 . @mdoering via Slack 2024-05-06: heard back from LPSN - they have good progress, but the API with classification will be public and usable by us at the end of May - so we might be able to get LPSN into the June release YR: LPSN - good! But we have no June release. It will be AC24 in June. It's risky to publish AC24 with new dataset. Especially, without Taxonomy Group assessment of LPSN and approval of ITIS replacement. Let's do it in July-August
Rotifers
FishBase - ?
Paris/DBTNT - ?
LDL Neuropterida
=============================
Filling gaps:
Suborder Symphyta (Hymenoptera) https://github.com/CatalogueOfLife/data/issues/579 Class/order Diplura https://github.com/CatalogueOfLife/data/issues/577 Order †Permopsocida (Insecta) https://github.com/CatalogueOfLife/data/issues/578 Family Promecheilidae (Tenebrionoidea, Coleoptera) https://github.com/CatalogueOfLife/data/issues/580
Duplicated taxa
TASKS, 2024-05-23:
[x] Identical family 19: 11 of 19 cases WoRMS Mollusca vs PaleoBioDB (check for genera first) WoRMS Mollusca: superorder †Ammonoida Haeckel, 1866 with 3 spp = BLOCKED 2024-05-23, re-synced PaleoBioDB: order †Ammonoidea with 9488 spp 3 cases remain unresolved on 2024-05-31. 1 case remains on 2024-06-04 (Systema Dipterorum re-synced 2024-06-04).
[x] Identical tribe 2 = RESOLVED Re-synced StaphBase, Collembola.org
[x] Identical species ACC-ACC species (same authors) 251
3 duplicated paleospecies across Species Files = DELETED in Grylloblattodea:
Alekhosara reticulata Aristov, 2008 in Grylloblattodea and in Orthoptera
Palaeomesorthopteron pullus Aristov, Grauvogel-Stamm & Marchal-Papier, 2011 in Grylloblattodea and in Embioptera
Thaumatophora pronotalis Riek, 1976 in Grylloblattodea and in Plecoptera
3 duplicated species in WFO Plant List 2023-12 Larix czekanowskii, Picea fennica & Pinus hakkodensis = DELETED in the project Tree (Assembly)
2024-06-05:
@aoern, we are preparing 2024 Annual Checklist in June. Would you be able to run your checks BEFORE final release? I am going to close first draft by 10-11th June. If I send you a link to the interim version in COLDP, would you be so kind to do checks?
Yes, of course! Ariyroskov @.***> kirjoitti 30.5.2024 kello 23.40: @aoern, we are preparing 2024 Annual Checklist in June. Would you be able to run your checks BEFORE final release? I am going to close first draft by 10-11th June. If I send you a link to the interim version in COLDP, would you be so kind to do checks?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
PREVIEW release started 2024-05-31, 12:54 pm (server time) Finished as Annual Checklist 2024, id 297740, 2024-05-31, 2:16 pm Deployed to the preview website 2024-05-31
CHECKS
PREVIEW release started 2024-06-04, 4:21 pm (server time) Finished as Annual Checklist 2024, id 298077, 2024-06-04, 5:46 pm Deployed to the preview website 2024-06-04
See duplicated uninomial taxa (Systema Dipterorum & GLI need to be re-synced)
PREVIEW release started 2024-06-05, 10:11 pm (server time) Finished as Annual Checklist 2024, id 298097, 2024-06-05 Deployed to the preview website 2024-06-06
PREVIEW release started 2024-06-06, 3:44 pm (server time) Finished as Annual Checklist 2024, id 298177, 2024-06-06 Deployed to the preview website 2024-06-06
PREVIEW release started 2024-06-06, 6:45 pm (server time) Finished as Annual Checklist 2024, id 298184, 2024-06-06, 8:18 pm Deployed to the preview website 2024-06-07
with RWC
See https://github.com/CatalogueOfLife/data/issues/668 https://github.com/CatalogueOfLife/data/issues/669#event-13080073794 https://github.com/CatalogueOfLife/data/issues/667#issuecomment-2155102700
2024-06-07, require re-sync:
Systema Dipterorum = re-synced 2024-06-07
ReptileDB (re-check TASKS, sectors, classification) = re-synced 2024-06-07
WCVP = re-synced 2024-06-07. ATTENTION: merge sector detected:
CoL TASKS of 2024-06-07:
https://www.checklistbank.org/catalogue/3/tasks
@mdoering, that is very much wrong. (You'll understand the problem when open ACC-ACC species (different authors) 36349 or ACC-ACC species (same authors) 47911)
So far, 298184 is a best candidate for AC24.
any idea where that is coming from? SD syncs?
@mdoering, it was happened after I applied some "polishing" in AC24 draft and completed re-syncs of 3 GSDs: Systema Dipterorum = re-synced 2024-06-07 ReptileDB (re-check TASKS, sectors, classification) = re-synced 2024-06-07 WCVP = re-synced 2024-06-07
As I can see, the main problem caused by WCVP. I did re-sync of all global sectors one-by-one, because I saw "merge sector" in the list of sectors. Results look awful: WCVP species duplicated World Plants, but allocated in wrong families. I believe, it happened due to changed IDs (but I have re-synced the same version of WCVP which was used before (according to metadata) and all sectors were shown as healthy).
Species statistics comparison (Preview 2024-06-06 (id 298184) = before WCVP sync, i.e. as it should be: | Project3 | Preview 2024-06-06 (id 298184) |
---|---|---|
Tracheophyta | 415458 | 359898 |
Liliopsida | 88508 | 81334 |
Magnoliopsida | 311532 | 263146 |
Ginkgoopsida | 333 | 333 |
Pinopsida | 833 | 833 |
Polypodiopsida | 12392 | 12392 |
I have an idea of what goes wrong with WCVP. They do not have stable family identifiers and the same identifier can point to a different family next time. If we rematch all WCVP sectors and resync afterwards it should be fine again
How to do rematch for WCVP?
https://api.checklistbank.org/dataset/3/sector/1698
"subjectDatasetKey":2232, "subject":{ "id":"xS", "name":"Rubiaceae", "rank":"family", "status":"accepted", "broken":false, "label":"Rubiaceae", "labelHtml":"Rubiaceae", }
but xS points to Campanulaceae.
rematching via UI does not help, let me see how we can best address this one...
For attention of @olafbanki & @mdoering:
In a case if Markus meet problems with the Project3 cleaning from unwanted WCVP records which were synced on 2024-06-07, the candidate of 2024-06-06 for the 2024 Annual Checklist is ready in CLB with id 298184.
Its preview is here https://preview.catalogueoflife.org/data/metadata
(If fixes will be successful, RWC import of 2024-06-10 should be re-synced = RWC SYNCED 2024-06-11).
@yroskov I rematched all 111 WCVP sectors and mostly they updated fine:
{"broken":13,"updated":96,"unchanged":2,"total":111}
But there are 13 broken sectors now which I will try to solve manually through the UI.
I simply matched again and resolved all sectors now:
{"broken":1,"updated":12,"unchanged":98,"total":111}
The reason not all rematched fine in the beginning was that we cannot have 2 sectors with the same subject - and because IDs were wrong some ids were already (falsely) taken and the rematch failed. Doing it a second time removed that problem.
The single sector still reported as broken is the merge sector without any subject - I will see that we change the reporting to exclude those
@yroskov I will trigger syncs now for all of WCVP to replace the bad data
Thanks! Go ahead
WCVP syncs completed. Tasks in Project 3:
@mdoering, TASKS in Project 3 look good after the cleanings. But species number in Tracheophyta show 2,818 extra species comparing with expected number as in Preview 2024-06-06 (id 298184):
Preview 2024-06-06 (id 298184) | Project3_before cleaning | Project3_after cleaning | ||
---|---|---|---|---|
Tracheophyta | 359898 | 415458 | 362716 | +2,818 |
Liliopsida | 81334 | 88508 | 83494 | +2,160 |
Magnoliopsida | 263146 | 311532 | 263804 | +658 |
Ginkgoopsida | 333 | 333 | 333 | = |
Pinopsida | 833 | 833 | 833 | = |
Polypodiopsida | 12392 | 12392 | 12392 | = |
On my end, I cannot find and check these unexpected extra species.
well, it is a newer version of WCVP. The 298184 preview used data synced from WCVP attempt 17-18, version 10.0 / 2022-10-27
The current is version 13.0 / 2024-05-16
The entire current WCVP has 365.790 species, the older version 10 (attempt 18) had 357.450 species.
Thanks! That's explains difference.
WCVP Tracheophyta species counts are
So WCVP species in COL have increased by 2.818 just as you have observed. In the entire WCVP dataset the increase was 8.340 species, but we only use parts of it.
btw, the WCVP Fabaceae people have also updated their annual version some weeks ago. Do you foresee a problem to sync that one also?
new 2024 version available:
2023 version we use:
WCVP-Fabaceae: ...Do you foresee a problem to sync that one also?
Unfortunately, yes. Two issues: Broken decisions: 4308 and nested WWW genera need to be re-done.
Let see, what we can do on a week of 17th June
update decisions for ambiguous synonyms are a lot of work. Maybe we should think about flagging them automatically if another synonym with the same name exists in a dataset. Then we could remove the entire status, have just synonyms and make our work a little bit simpler.
...update decisions for ambiguous synonyms
This depends on data "quality" in the checklist imported in CLB: some GSD have very simple cases which can be easily resolved automatically; others need investigation on what was happened (chresonym is only one case). It would be nice if we succeed to classify such cases and work out protocols.
PREVIEW release started 2024-06-17, 4:43 pm (server time) (First PREVIEW after fixes in WCVP sectors & re-sync) Finished as Annual Checklist 2024, id 298597, 2024-06-17, 6:19 pm Deployed to the preview website 2024-06-17
Dear @aoern,
I would believe, we have completed our first "proof" of the 2024 Annual Checklist now (https://preview.catalogueoflife.org/).
Here is its CoLDP export https://urldefense.com/v3/https://download.checklistbank.org/job/f0/f0fec354-bc73-4352-8f22-eaef1d246b4f.zip;!!DZ3fjg!6N3jNoLLrkKddZmy9RWHZX3aD6T-3rAyB4OcdnZWyGkiGVihO7z4AVDlg0qof2pEM0BjAwiCLOGRENi83ccQ2EbezU0$ [700.6 MB]
Could you please do your routine checks? Do not dig too deep. We'll have a few days to fix only real disaster (if there is one).
I am able to download it tomorrow and start checking. Ariyroskov @.***> kirjoitti 18.6.2024 kello 16.49: Dear @aoern, I would believe, we have completed our first "proof" of the 2024 Annual Checklist now (https://preview.catalogueoflife.org/). Here is its CoLDP export https://urldefense.com/v3/https://download.checklistbank.org/job/f0/f0fec354-bc73-4352-8f22-eaef1d246b4f.zip;!!DZ3fjg!6N3jNoLLrkKddZmy9RWHZX3aD6T-3rAyB4OcdnZWyGkiGVihO7z4AVDlg0qof2pEM0BjAwiCLOGRENi83ccQ2EbezU0$ [700.6 MB] Could you please do your routine checks? Do not dig too deep. We'll have a few days to fix only real disaster (if there is one).
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
= FIXED 2024-06-21 (MD: To fix the problem I have rematched the entire project and there is no missing match any longer. Doing a new release now)
New release of 2024-06-21 completed by Markus id 298708 Deployed to the preview website 2024-06-21
CHECKS of 2024-06-21:
[x] GSD stats = OK
[x] Incorrect GSD version in Source Datasets page (says "Annual Checklist 2024" for all GSDs instead of individual versions of GSDs) - for attention of @mdoering = FIXED 2024-06-24
https://preview.catalogueoflife.org/data/source-datasets
https://github.com/CatalogueOfLife/backend/issues/1333 also https://github.com/CatalogueOfLife/portal/issues/213
PREVIEW release started by Markus 2024-06-24, 1:46 pm (server time) (After https://github.com/CatalogueOfLife/backend/issues/1333#issuecomment-2186627873) Finished as Annual Checklist 2024, id 298863, 2024-06-24, 3:12 pm Deployed to the preview website 2024-06-24
CHECKS of 2024-06-24:
[x] Incorrect GSD version in Source Datasets page = FIXED
[x] GSD version is still missing in the recommended bibliographic citation = @mdoering, the GSD version is important in the bibliographic citation. Old version of GSD might be re-published in the CoL from one month/yeat to the next, and we need to indicate which version is in use in the each edition of the CoL. Agreed solution: version should become a part of the GSD Title in the citation; put it in brackets after the title, e.g. UCD Community. (2024). Universal Chalcidoidea Database curated in TaxonWorks (version May 2024). In O. Bánki...
[x] Taxonomic Coverage (taxon & classification string) is missing now in a set of GSD metaadata (N/A value appears) = FIXED (2024-06-25)
for example:
https://preview.catalogueoflife.org/data/dataset/1061
https://preview.catalogueoflife.org/data/dataset/1169
https://preview.catalogueoflife.org/data/dataset/1089
https://preview.catalogueoflife.org/data/dataset/1065
https://preview.catalogueoflife.org/data/dataset/1204
https://preview.catalogueoflife.org/data/dataset/1101
etc.
[x] Sync of SF Plecoptera was cancelled (!) on 2024-06-05. Re-synced 2024-06-24. Requires re-release.
[x] @gdower, UCD metadata are empty in CLB, could you please fix this = FIXED 2024-06-25 https://preview.catalogueoflife.org/data/dataset/124661 https://www.checklistbank.org/dataset/124661/about
PREVIEW release started 2024-06-25, 3:45 pm (server time) Finished as Annual Checklist 2024, id 298890, 2024-06-25, 5:11 pm Deployed to the preview website 2024-06-25
PREVIEW release started 2024-06-25, 6:30 pm (server time) (check list of GSDs) Finished as Annual Checklist 2024, id 298894, 2024-06-25, 8:09 pm Deployed to the preview website 2024-06-25
PREVIEW release started 2024-06-25, 8:41 pm (server time) Finished as Annual Checklist 2024, id 298904, 2024-06-25 Deployed to the preview website 2024-06-26
@yroskov @gdower the UCD metadata contains 2 paragraphs of lists of contributors. That should really be in the contributors sections, thats exactly what its for:
List of Active Curators: Roger Burks (site designer) Newport Beach, CA, USA; Lucian Fusu, Al. I. Cuza University, Iasi, Romania; D. Christopher Darling, Royal Ontario Museum, Toronto, ON, Canada; John Heraty, University of California, Riverside, CA, USA; Petr Janšta, Charles University, Prague, Czech Republic; Mircea-Dan Mitroiu, Al. I. Cuza University, Iasi, Romania; Pâmella Machado Saguiah, Ciências Biológicas pela Universidade Federal do Espírito Santo, Brazil; Natalie Dale-Skey, The Museum of Natural History, London, United Kingdom; James B. Woolley, Texas A&M University, College Station, TX, USA TaxonWorks Development and Outreach Team: Matt Yoder, Dmitry Dmitriev, José Luis Pereira, Hernán Lucas Pereira, Deborah Paul.
The description also contains species numbers which quickly get out of date. I would propose to remove that sentence as we already have more than 31.000 species than mentioned:
As of today, the UCD contains 27968 valid species (including some subspecies) and 2295 valid genera (including some subgenera).
UCD metadata will stay as they are now, until UCD group change their mind and text on the website.
Dear @olafbanki, the dataset id 298904 of 2024-06-25 is finalized now as 2024 Annual Checklist.
UCD metadata will stay as they are now, until UCD group change their mind and text on the website.
But it's not good practice the way it is
I'll pass your concerns to UCD team ;) in July
The AC2024 has been published!
Delivery date: June 2024
Re-synced: Alucitoidea, Collembola.org, Global Gracillariidae, ITIS, MOWD, Pterophoroidea, ReptileDB, Species Fungorum Plus, StaphBase, Taxapad Ichneumonoidea, TITAN, UCD, WCVP (fully updated now), WTaxa, ZOBODAT Vespoidea