Open yroskov opened 3 years ago
2021-11-08: ITIS of 2021-10-28 re-imported via FTP.
TASKS on 2021-11-09 (no changes)
Resolved:
Re-synced 2021-11-09.
The ITIS load for November was completed this week (dated December 2). You should be able to get it through any of the normal ways (website or FTP). No new GSDs for COL, but one existing GSD is updated (the HUGE bee family Megachilidae, with nearly 5600 new names added for the 4200+ species, so synonymy has been greatly expanded, too).
Thanks, Dave! we'll proceed now with updates.
ITIS of 2021-12-02 imported 2021-12-10
Synced 2021-12-10
I am told that the December load has been completed (dated 20 December 2021), and in addition to updating some existing GSDs (updated rest of oribatid mites, and tweaks to bumble bees), we have the following new GSDs available to fill COL gaps: Mite superfamily Cheyletoidea (1276 species, 1889 names) Mite superfamily Cloacaroidea (19 species, 33 names)
These superfamilies are found under Infraorder Eleutherengona, here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Acariformes : Trombidiformes : Prostigmata : Eleutherengona : Cheyletoidea & Cloacaroidea
ITIS of 2021-12-20 imported 2021-12-23
Synced 2021-12-23
The January 2022 ITIS load was completed, and I am told that the downloads page has the current data (or you can use the FTP I gave, should be identical). No new GSDs for COL, but among the updates was the first bird family (Honeyeaters / Meliphagidae) where we included all the names treated as valid/accepted by several major bird sources that are widely used (the IOC list is what we followed, the other sources' names, where they differed from IOC's, were placed in synonymy consistent with the IOC view)... Bird sources reconciled this way were IOC, H&M4, eBird/Clements, HBW/Birdlife5. This way, users of those taxonomies will find their names in ITIS (and/or COL, of course) and see what their use corresponds to in ITIS' IOC data (at least until those sources make additional updates). We expect to do this for all bird updates going forward.
An example of this is the following: Territornis reticulata (Temminck, 1820) (valid, IOC & eBird/Clements) vs. Meliphaga reticulata Temminck, 1820 (invalid) (used in H&M4) vs. Microptilotis reticulatus (Temminck, 1820) (invalid) (used in HBW and BirdLife International 5)
ITIS of 2022-01-31 imported 2022-02-15 (Thank you, Dave! Nice to hear about extra combinations in bird species. It's important for CoL users)
An attempt to re-match blank sector in Coleoptera failed and reported "broken sector". However, the sector is not flagged as "broken" in the report on ITIS sectors. https://github.com/CatalogueOfLife/backend/issues/1105
@mdoering identified blank sector as suborder Archostemata in Coleoptera https://github.com/CatalogueOfLife/checklistbank/issues/1007#issuecomment-1041859751 (the only one broken sector in CoL of today)
Synced 2022-02-16
PLEASE NOTE: IF you use the FTP site to download, note that the version for 31 January is the one you want. There is a subsequent version that is not to be used (we are trying to diagnose a load failure for a file we wanted to include in January but removed due to problems).
Superfamily Cheyletoidea was lost in CoL, 1,276 spp. The sector was established in ITIS of 2021-12-20; now it disappeared for unknown reason; it was not reported as broken sector.
2022-02-18: Superfamily Cheyletoidea re-established, synced.
Ernie Spencer, eml of Sat, 12 Feb 2022: Why does COL have "Eurasian Oystercatcher Eurasian Oystercatcher English" four times in a row?
Identical common name may appear more then once (@gdower?)
Should be fixed in the next update.
New version of ITIS (2/28/2022) is available, but the download files on the website aren't yet updated. The FTP files (link previously given) are the new version, so go ahead and use that if you're ready to get the new data.
We updated our existing GSDs for Amblypygi, Anostraca, and five bird orders (Bucerotiformes, Coliiformes, Coraciiformes, Leptosomiformes, Trogoniformes, these birds are all updated with the names used by the major bird sources, described above for Meliphagidae). Presumably those will all update automatically.
New GSDs for COL gaps are:
A few additional updates were made in groups COL gets from other sources, but won't matter for COL.
ITIS of 2022-02-28 imported 2022-03-02
(Thanks, Dave! We are in progress completed now)
ISSUES: selected issues assessed
[ ] Multi Word Monomial, 59 - all are genera (acc & syn) with hybrid symbol
[x] Unmatched Name Brackets, 70 - "not-closed" brackets in authorstrings, @DaveNicolson, you might be interested to see this list. Minor technical issue with brackets in few names, which you may like to fix in huge dataset. https://www.checklistbank.org/catalogue/3/dataset/2144/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=unmatched%20name%20brackets&limit=100&offset=0
TASKS
Resolved 2022-03-02
Synced 2021-03-02
The new March 2022 version of ITIS is available now, the following are new GSDs that appear to be gaps in COL:
Mite family Erythraeidae which is here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Acariformes : Trombidiformes : Prostigmata : Anystina : Erythraeoidea : Erythraeidae [NOTE: you could instead just replace the family Smarididae with superfamily Erythraeoidea, which is now complete, with those 2 families]
Mite family Listropsoralgidae which is here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Acariformes : Sarcoptiformes : Astigmata : Sarcoptoidea : Listropsoralgidae
Mite family Pachylaelapidae which is here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Eviphidoidea : Pachylaelapidae
Among other updates we made this month are an update of the bird family Ictaluridae, which should get updated automatically in COL since it's already in an ITIS GSD. This update included the names used by the major bird sources, described above for Meliphagidae).
I believe the full downloads page has the new data, but I know that the new data are up on the FTP site I previously gave you.
Thank you, Dave!
ITIS of 2022-03-28 imported 2022-04-01
Synced 2022-04-01
The April 2022 load for ITIS was delayed in appearing on the site, but is all done now. For COL, there was one new GSD included, for "chiggers" (families Trombiculidae [2745 spp.] & Leeuwenhoekiidae [282 spp.], noting that the sometimes-recognized family Walchiidae has a nomenclatural priority issue, but in any case is treated as a subfamily of Trombiculidae called Gahrliepiinae, which has priority over Walchiinae on a technicality of the Code (ICZN Articles 40.2 & 40.2.1)). The placement of the families is here:
Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Acariformes : Trombidiformes : Prostigmata : Anystina : Trombiculoidea : Trombiculidae & Leeuwenhoekiidae
You can get it from either method, the website's monthly exports, or via FTP site previously shared.
Thank you, Dave!
ITIS of 2022-04-26 imported 2022-05-04
Synced 2022-05-05
Thanks, Yuri. Note also that you will need to remove the accepted family Walchiidae from COL (it is now a junior synonym): https://preview.catalogueoflife.org/data/taxon/HVG
Oh, thank you! Missed to do it before. Now done.
ITIS of 2022-05-26 imported 2022-06-09
TASKS
Broken decisions, 415: deleted all
Manuscript names, 4 of 17: ok
Resolved:
Synced 2022-06-10
@mdoering, what's happened? After 8h syncing, 1 of 99 sector is in progress, 98 are in queue.
2022-06-13:
@olafbanki ?
ITIS' June load is now available, and it includes one new GSD for a gap in COL, the mite family Blattisociidae which is placed here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Phytoseioidea : Blattisociidae
Aside from updates to groups not currently used by COL, we did update one of the used GSDs, that of phylum Onychophora, which is here in ITIS (should update automatically, just noting it): Animalia : Bilateria : Protostomia : Ecdysozoa : Onychophora
ITIS of 2022-06-28 imported 2022-07-05
Synced 2022-07-05
Sync of all ITIS sectors was launched 2022-07-05. A day after, sync is at the same stage - ITIS Acanthocephala "is in progress" and 101 syncs in a queue. https://github.com/CatalogueOfLife/backend/issues/1156
It looks like all ITIS syncs are cancelled (2022-07-07):
Yes, see slack
Slack: [Markus Döring] [6:00 AM] for some reason the database calls during the ITIS sector syncs are so slow they never end - I might have to improve the whole syncing a lot. Not bad timing as this is the same area I work on for the extended catalogue and face performance issues there too [6:00] Yury, please do other syncs but not ITIS at this stage
@mdoering, did you fix sync for ITIS? All other GSDs of July are already completed and synced.
https://github.com/CatalogueOfLife/backend/issues/1156#issuecomment-1181777959
@mdoering launched syncs of all ITIS sectors 2022-07-12. Completed sussessfully.
ITIS of 2022-06-28 synced 2022-07-12
ITIS' July load is now available, and it includes one new GSD for a gap in COL, the mite family Ascidae which is placed here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Ascoidea : Ascidae
Also added as new GSDs for ITIS (and I think for COL) are the following...
Infraorder Procarididea GSD (TSN 1186755; elevated from superfamily), found here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Crustacea : Malacostraca : Eumalacostraca : Eucarida : Decapoda : Pleocyemata : Procarididea
Multiple "sibling" superfamilies (all new GSDs this month in ITIS, gaps in COL I think), all placed under: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Crustacea : Malacostraca : Eumalacostraca : Eucarida : Decapoda : Pleocyemata : Caridea : tsn name 621188 Bresilioidea 621190 Campylonotoidea 206944 Crangonoidea 621189 Nematocarcinoidea 621187 Oplophoroidea 206943 Pandaloidea 206938 Pasiphaeoidea 621192 Physetocaridoidea 621191 Processoidea 206941 Psalidopodoidea 206937 Stylodactyloidea
Finally, this doesn't need separate action, as it is an update to an existing ITIS GSD, but the update to bird order Falconiformes is noteworthy since it adds more extensive synonymy, including accounting for the names used as valid/accepted in the 4 major world bird sources (described above in earlier updates)...
Also, I'm not sure what happened, maybe the folks working on the hierarchy for Crustacea can comment, but it looks to me (and I think Ed DeWalt) like COL is missing Infraorder Caridea as child of Suborder Pleocyemata. At least as used in ITIS, it contains the following superfamilies that are not currently placed in any infraorder in COL: Alpheoidea Atyoidea Bresilioidea Campylonotoidea Crangonoidea Nematocarcinoidea Oplophoroidea Palaemonoidea Pandaloidea Pasiphaeoidea Physetocaridoidea Processoidea Psalidopodoidea Stylodactyloidea
ITIS of 2022-08-01 imported 2022-08-03
[x] Imported: 486570 spp
[x] Metadata: date & version OK. @DaveNicolson, please check and update rest of metadata at https://www.checklistbank.org/dataset/2144/about. It might be a last chance to change metadata before AC22 release.
[x] New sectors established:
phyllum Arthropoda - subclass Arachnida - superorder Parasitiformes - order Mesostigmata - infraorder Gamasina - superfamily Ascoidea - family Ascidae
phyllum Arthropoda - class Malacostraca - order Decapoda - suborder Pleocyemata - infraorder Procarididea
Infraorder Caridea is absent in CoL. Established as a new sector: phyllum Arthropoda - class Malacostraca - order Decapoda - suborder Pleocyemata - infraorder Caridea Deleted previously established ITIS sectors and branches in the tree for superfamilies Alpheoidea & Atyoidea. These superfamilies are children of newly established infraorder Caridea. @DaveNicolson, please check the assembly of new sectors.
[x] Sectors: OK
Synced 2022-08-08
Also, I'm not sure what happened...
As I raised before, classification of various taxa in Arthropoda (not only Crustacea) need to be reviewed and fixed in the CoL. Awaiting instructions from Taxonomy Group incl. clear cross-map with present CoL classification and attached GSDs.
Where can I see the new sectors? I looked in https://preview.catalogueoflife.org/?taxonKey=7NFJ8 but not seeing it there.
Infraorder Caridea is not yet completed, but the last part of it is nearing completion, and will be added to ITIS & available to COL in the next month or two. Superfamily Palaemonoidea is the last part we still need to complete (soon)...
Thanks for the reminder on the metadata, we're looking at it.
Where can I see the new sectors?
In the assembly: https://www.checklistbank.org/catalogue/3/assembly. A preview is not deployed yet.
@DaveNicolson, Update: now also available at PREVIEW: https://preview.catalogueoflife.org/
@yroskov , unfortunately, that assembly page is not viewable by my account:
I've asked the person who did the bulk of the work on the superfamilies to have a look at a few sample species from each superfamily, via the new Preview version, to make sure it all looks OK. As I noted above, all of Caridea in ITIS is complete EXCEPT for superfamily Palaemonoidea, which we are still finalizing. I am not sure if it is best to leave Caridea wrongly as a complete GSD before we get that last (large) superfamily loaded. Especially if you're working on the version that will become AC2022.
As for the metadata for the ITIS GSDs all together in COL, are you looking for a new YAML file from us, or guidance on what goes where? I don't currently have any editing rights for that metadata page you linked.
error 403: @gdower, it looks like Dave has no access to the project. Are you able to open access for him?
@DaveNicolson, Geoff gave dave_n reviewer access to the project. It means, all pages inside the project should be visible for you now. Please try this link again: https://www.checklistbank.org/catalogue/3/assembly
superfamily Palaemonoidea
@DaveNicolson, if you insist, I can block superfamily Palaemonoidea in the candidate checklist for ac22. Just let me know today. (It's more manageble to keep one sector (minus one superfamily) than 16 sectors inside Caridea).
As for the metadata for the ITIS GSDs all together in COL, are you looking for a new YAML file from us, or guidance on what goes where? I don't currently have any editing rights for that metadata page you linked.
@gdower, could you please advise Dave on how to proceed. (I, personally, would prefer to open editorial access for David for manual adjustments in the CLB metadata form at https://www.checklistbank.org/dataset/2144/about).
@DaveNicolson, Geoff gave dave_n account editor access to dataset 2144. Please adjust ITIS metadata as you need.
@yroskov will the version currently being built end up as AC22, or will that be built next month? Depending on that, the response on the Caridea GSD question may be different. If this version being built now will become AC2022 then we need to be accurate and not suggest that superfamily Palaemonoidea is a GSD at this time. In that case, superfamily Palaemonoidea should remain in the hierarchy of COL (like any non-GSD group) but omit species. You could temporarily place it outside of Caridea if you prefer, and when we load that superfamily it can perhaps be moved into Caridea and become part of that single GSD. That's an awkward position since we're saying Caridea is complete when it excludes a significant superfamily, and I'd rather not do that in an Annual Catalogue!
Otherwise you'd have to handle it as 13 GSDs at the superfamily level and one empty superfamily: Alpheoidea ITIS GSD Atyoidea ITIS GSD Bresilioidea ITIS GSD Campylonotoidea ITIS GSD Crangonoidea ITIS GSD Nematocarcinoidea ITIS GSD Oplophoroidea ITIS GSD Pandaloidea ITIS GSD Pasiphaeoidea ITIS GSD Physetocaridoidea ITIS GSD Processoidea ITIS GSD Psalidopodoidea ITIS GSD Stylodactyloidea ITIS GSD
Palaemonoidea [not GSD, retained only as part of hierarchy within Caridea]
Or omit Caridea for now and leave those superfamilies with no infraorder, as they have been...
If the AC2022 will be built NEXT MONTH then I guess we can leave it as a single GSD but with Palaemonoidea outside Caridea, and once we load it in ITIS you can simply move it into Caridea to become part of that GSD. That leaves an awkward month when the superfamily is misplaced and the Caridea GSD is temporarily incomplete (aspirational at that point, soon to be actual GSD), but that is not that serious an issue if it only affects a monthly version.
I hope that makes some sense. We are closing in on being ready to load that last superfamily.
@DaveNicolson, Here is a result (Palaemonoidea retained only as a part of hierarchy within Caridea): https://www.checklistbank.org/catalogue/3/assembly?assemblyTaxonKey=3220e163-aeed-4785-b872-967b9f2a8256&sourceTaxonKey=206940
My understanding, "the version currently being built end up as AC22".
Thank you, @yroskov, that looks fine in assembly.
As a point of clarification, will that metadata page be for (1) ITIS' data in COL, (2) the ITIS data used by COL in ChecklistBank, (3) the full ITIS dataset in ChecklistBank, or some combination of those? I am preparing to make edits to the metadata.
@DaveNicolson, Metadata page, as it works in present architecture: It's only a single entry for whole ITIS in the ChecklistBank. A copy of the metadata will be synced into the CoL with ITIS sectors in CoL. The metadata also will continue to stay in ChecklistBank with whole ITIS. (ChecklistBank is a GBIF tool. CoL is not the only project which may use data imported in ChecklistBank. Other project also may take ITIS data with the same metadata).
It's only a single entry for whole ITIS in the ChecklistBank. A copy of the metadata will be synced into the CoL with ITIS sectors in CoL.
Yes. There is also an option to have a metadata patch for a project source that modifies the metadata that becomes part of the project and it's releases. So you can have different metadata for all of ITIS in ChecklistBank and ITIS in COL.
2022-08-15: ITIS metadata (modified August 10th 2022 by dave_n) as they appear in 2144.yaml file:
title: The Integrated Taxonomic Information System alias: ITIS description: The Integrated Taxonomic Information System (ITIS, www.itis.gov) partners with specialists from around the world to assemble scientific names and their taxonomic relationships, and distributes that data openly through publicly available software. The ITIS mission is to communicate a comprehensive taxonomy of global species that enables biodiversity information to be discovered, indexed, and connected across all human endeavors. ITIS is made up of 11 active MOU partners https://www.itis.gov/mou.html committed to improving and continually updating scientific and common names of all seven Kingdoms of Life (Archaea, Bacteria, Protozoa, Chromista, Fungi, Plantae, and Animalia).
The full ITIS content is updated regularly in ChecklistBank, and many completed taxonomic subsets are also used in the Catalogue of Life. Although ITIS cannot here summarize the sources and status of the many individual Global Species Databases ITIS contributes to the Catalogue of Life, users interested in additional detail may refer to ITIS' "What's New" page: https://www.itis.gov/whatsnew.html
ITIS staff have made substantial contributions to the conception and design of the ITIS work product. This includes acquisition, analysis, and interpretation of data, as well as the creation and maintenance of software used to collect and distribute data. The authorship for ITIS - as distinct from stewards and specialists who contribute their taxonomic expertise to segments of ITIS data - is ordered alphabetically by last name because the data herein have been reviewed and approved as a team. As one the ITIS team have agreed to be accountable for the content of ITIS. Questions related to the accuracy or integrity of ITIS data should be directed to the team at itiswebmaster@si.edu. All questions regarding the content of ITIS will be appropriately investigated by the team and resolved and documented openly by the team.
issued: 2022-08-01 version: 2022-08-01 contact: city: Washington state: DC country: US email: itiswebmaster@itis.gov url: https://www.itis.gov address: Washington, DC, United States of America organisation: The Integrated Taxonomic Information System
editor: given: Sara family: Alexander email: alexandersar@si.edu
given: Alicia family: Hodson email: hodsona@si.edu orcid: 0000-0002-5418-244X
given: David family: Mitchell email: mitchelld@si.edu orcid: 0000-0002-7987-0679
given: Dave family: Nicolson email: nicolsod@si.edu orcid: 0000-0003-1038-3028
given: Thomas family: Orrell email: orrellt@si.edu orcid: 0000-0003-3270-9551
given: Daniel family: Perez-Gelabert email: perezd@si.edu
geographicScope: Global & Regional taxonomicScope: Biota confidence: 5 completeness: 100 license: cc0 url: https://itis.gov logo: https://www.itis.gov/Static/images/ITIS_wordmark.png source: []
ITIS completed a new load (although the downloads html page doesn't have the data yet, you can get it from the FTP site I previously shared, the data are in ITIS, there is just a hitch in putting the monthly export pages on the website). For COL purposes, we have added four superfamilies of mites in suborder Astigmata, all of which are gaps in COL. They include almost 1300 species all together. Those newly-added superfamilies are placed under: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Acariformes : Sarcoptiformes : Astigmata
The newly-added superfamilies are: Superfamily Canestrinioidea (4 families) Superfamily Hemisarcoptoidea (7 families) Superfamily Histiotomatoidea (2 families) Superfamily Schizoglyphoidea (1 family)
Other updates to existing ITIS GSDs that COL uses (so should be automatically updated) include multiple families sometimes treated as Emberizidae & Thraupidae (both sensu lato). The updates include all the names used as valid/accepted from the major world bird sources (as noted above) and otherwise extends synonymy. Just for the record, these families were completely updated: Calyptophilidae, Emberizidae, Mitrospingidae, Nesospingidae, Passerellidae, Rhodinocichlidae, Spindalidae & Thraupidae
ITIS of 2022-08-29 imported 2022-08-31
Synced 2022-09-01
ITIS' September load was completed late last week, and the data are available via the standard downloads page (the former FTP access is no longer an option), and completes the last (and largest) portion of infraorder Caridea (superfamily Palaemonoidea). This means you can wrap up all of Caridea into a single GSD for the infraorder, instead of a bunch of separate superfamily GSDs. The current placement in ITIS is: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Crustacea : Malacostraca : Eumalacostraca : Eucarida : Decapoda : Pleocyemata : Caridea
The parts of this GSD were added in the last 20 months or so, and it currently contains 3757 species across about 400 genera in the 14 superfamilies.
Additionally, multiple mammal families in Carnivora were fully-updated, which should automatically be reflected in COL once the ITIS data are digested.
@gdower, when you'll have an opportunity, could you pls proceed "via the standard downloads page (the former FTP access is no longer an option)".
https://www.checklistbank.org/dataset/2144/
Source of global sectors:
file ITIS_GSDs+Updates_forCoL_2020-03-03.xlsx
From: Nicolson, David Sent: Tuesday, March 3, 2020 23:01 To: Roskov, Yury Cc: Orrell, Thomas Subject: Initial list of ITIS GSDs for addition (or consideration) to CoL
Yuri, OK, here is my first pass trying to detect ITIS GSDs that should (or could) be added or updated in CoL. It includes GSDs we added or updated in ITIS since the last time CoL was updated for ITIS (mid-2017), as well as a few cases where ITIS loaded a GSD that was not noted to you previously. I left out groups where CoL already has a solid/active source, assuming the source seemed to actually be providing a reasonably complete GSD (vs. an "aspirational" GSD that is not very close to complete).
They are sorted according to their placement in ITIS now, via a hierarchy column. Those with yellow question marks may or may not be used in CoL; a few already have a source for CoL, but I suggest at least considering switching to ITIS due to various issues.
I have included a few things that we will shortly have loaded into ITIS, and a few that we are actively working on now (for inclusion in ITIS later this year, likely before the ITIS CoLdp export is ready).
If we realize we missed anything I will let you know.
Thanks, Dave