CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

ITIS (id 2144): test report #8

Open yroskov opened 3 years ago

yroskov commented 3 years ago

https://www.checklistbank.org/dataset/2144/

Source of global sectors:

file ITIS_GSDs+Updates_forCoL_2020-03-03.xlsx

From: Nicolson, David Sent: Tuesday, March 3, 2020 23:01 To: Roskov, Yury Cc: Orrell, Thomas Subject: Initial list of ITIS GSDs for addition (or consideration) to CoL

Yuri, OK, here is my first pass trying to detect ITIS GSDs that should (or could) be added or updated in CoL. It includes GSDs we added or updated in ITIS since the last time CoL was updated for ITIS (mid-2017), as well as a few cases where ITIS loaded a GSD that was not noted to you previously. I left out groups where CoL already has a solid/active source, assuming the source seemed to actually be providing a reasonably complete GSD (vs. an "aspirational" GSD that is not very close to complete).

They are sorted according to their placement in ITIS now, via a hierarchy column. Those with yellow question marks may or may not be used in CoL; a few already have a source for CoL, but I suggest at least considering switching to ITIS due to various issues.

I have included a few things that we will shortly have loaded into ITIS, and a few that we are actively working on now (for inclusion in ITIS later this year, likely before the ITIS CoLdp export is ready).

If we realize we missed anything I will let you know.

Thanks, Dave

DaveNicolson commented 1 year ago

NB: https://www.itis.gov/downloads/index.html

yroskov commented 1 year ago

ITIS of 2022-09-28 imported 2022-08-31

Synced 2022-10-10

DaveNicolson commented 1 year ago

A new version of ITIS is available, dated October 28, 2022, from https://www.itis.gov/downloads/index.html

A new GSD available for CoL is the mite family Melicharidae (250 species, 439 scientific names), which is found here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Ascoidea : Melicharidae

Other updates to existing ITIS GSDs that CoL uses (so should be automatically updated) include the bird orders Anseriformes (Ducks and Geese) and Charadriiformes (Shorebirds, Gulls and Related Birds). The updates include all the names used as valid/accepted from the major world bird sources (as noted above) and otherwise extends synonymy for the 570 accepted species (2284 scientific names in total) and common names.

yroskov commented 1 year ago

ITIS of 2022-10-28 imported 2022-11-03

Synced 2022-11-09; re-synced 2022-11-11; Code assigned with all sectors - re-synced 2022-11-17

DaveNicolson commented 1 year ago

I am told the November 2022 ITIS load is complete, and the full database can be downloaded via this page: https://www.itis.gov/downloads/index.html

There are no new GSDs offered this month, but substantial updates were made for (1) the bird order Galliformes, or ground fowl, etc.), where we include all the names used as valid/accepted from the major world bird sources (as noted above) and otherwise extends synonymy for the 302 accepted species (1946 scientific names in total) and common names. Also (2), we updated a lot of the oribatid mites again, in collaboration with several cooperators, covering all the infraorders except Brachypylina. Other than that, we tweaked a few things in various groups.

gdower commented 1 year ago

@DaveNicolson, thanks for letting us know. The itisMySQLBulk.zip download still has the October 28 edition but I'll check again later.

DaveNicolson commented 1 year ago

Sorry for this, I'm asking the relevant folks to ensure the downloads are updated. I will personally double-check before notifying in the future.

DaveNicolson commented 1 year ago

I should have told you that I would notify you when the issues blocking the deployment of the full download files was completed. I was planning on doing so. No point in checking the downloads until they have been deployed.

DaveNicolson commented 1 year ago

Finally, the download files were successfully synced to the cloud, so you can download the ITIS data now, when you're ready. Sorry for the mis-communication.

yroskov commented 1 year ago

ITIS of 2022-11-28 imported 2022-12-05

Synced 2022-12-06

DaveNicolson commented 1 year ago

Although the ITIS website doesn't yet fully reflect it, but the December load was completed AND the download file for the MySQLBulk currently contains the new data (itisMySQL122122), so you can go ahead and ingest the ITIS contributions from it. This month there was a full update of the GSD for Phylum Tardigrada, and we are in communication with the authors of the checklist regarding some remaining nomenclatural problems that we will be looking at in the New Year, so more updates will be coming to address those as they are considered (there are some names that are not available but are still being used as valid/accepted).

yroskov commented 1 year ago

ITIS of 2022-12-21 imported 2023-01-03

DaveNicolson commented 1 year ago

The January ITIS load is wrapping up, and I just confirmed that the ITIS MySQL Bulk download is now the new version (itisMySQL013023), so you can import the new data whenever you're ready for it.

New GSDs this month are:

1) The heteropteran infraorder Enicocephalomorpha (321 species, 437 names), found in ITIS at Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Hexapoda : Insecta : Pterygota : Neoptera : Paraneoptera : Hemiptera : Heteroptera : Enicocephalomorpha [NOTE: this group was already marked as an ITIS GSD, I guess we must have noted that we were working to complete it... so this should update automatically]

2) The decapod suborder Dendrobranchiata (524 species, 1254 names), found in ITIS at Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Crustacea : Malacostraca : Eumalacostraca : Eucarida : Decapoda : Dendrobranchiata

3) The mite family Podocinidae (38 species, 52 names), found in ITIS at Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Phytoseioidea : Podocinidae

As a separate matter (I will notify Donald for his consideration), an existing GSD in COL could potentially be replaced with the following new-to-ITIS GSD (COL lists the 2009 GSD "Mites GSD Ologamasidae" with 446 species and 509 names)...: The mite family Ologamasidae (469 species, 786 names), found in ITIS at Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Rhodacaroidea : Ologamasidae

yroskov commented 1 year ago

ITIS of 2023-01-30 imported 2023-02-01

Synced 2023-02-02

DaveNicolson commented 1 year ago

A new version of ITIS is now available for use, and it includes GSDs for the following 4 superfamilies that are gaps in COL: Paguroidea, Lithodoidea, Lomisoidea and Hippoidea

They are found in ITIS under infraorder Anomura, which is here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Crustacea : Malacostraca : Eumalacostraca : Eucarida : Decapoda : Pleocyemata : Anomura

Together, they cover 1447 valid/accepted species, and 2393 scientific names including synonymy.

Additionally, there is an update for the existing GSD, the mite suborder Endeostigmata, which is already coming from ITIS, so it should update automatically after the ITIS data are processed for COL.

yroskov commented 1 year ago

ITIS of 2023-03-01 imported 2023-03-07

yroskov commented 1 year ago

"superfamilies" of course. Thanks!

I stuck with establishing new sectors for ITIS: sync of Species Fungorum takes more than 5 h today.

@DaveNicolson, may I say "incl. new global checklists for crab superfamilies Hippoidea, Lithodoidea, Lomisoidea and Paguroidea" in What's New? "Crab superfamilies" - is it correct?

yroskov commented 1 year ago

Synced 2023-03-07

DaveNicolson commented 1 year ago

I think you can say that... Hippoidea are sand crabs & mole crabs... Paguroidea includes hermit crabs... Lithodoidea includes king crabs... Lomisoidea is for the hairy stone crab... so yeah, crab superfamilies works, if you want, or anomuran crab superfamilies, more specifically.

DaveNicolson commented 1 year ago

A new version of ITIS, dated 30 March, 2023, is available for use. There were delays due to scheduled maintenance and whatnot, so it goes. The updates to ITIS this month are all outside of GSDs used for COL, so there is nothing new for COL, but it is there for processing through Checklist Bank when you want it.

yroskov commented 1 year ago

ITIS of 2023-03-30 imported 2023-04-05

Synced 2023-04-12

DaveNicolson commented 1 year ago

A new version of ITIS is available for download & use, dated 26 Apr 2023. It contains some new GSDs for gaps in COL:

Decapod Infraorder Anomura is now a GSD in ITIS, as we completed the last remaining gaps this month (Galatheoidea, Chirostyloidea & Aegloidea). If feasible, please consolidate the separate GSD parts (previously completed parts are superfamilies Hippoidea, Lithodoidea, Lomisoidea & Paguroidea) into a single GSD for Infraorder Anomura, with 3286 species and over 5000 scientific names. It is here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Crustacea : Malacostraca : Eumalacostraca : Eucarida : Decapoda : Pleocyemata : Anomura

Mite superfamily Ascoidea is now also a GSD, as we completed the last remaining gaps this month (Ameroseiidae & Antennochelidae). Similarly, consider consolidating the prior GSD families (Ascidae & Melicharidae) into a single GSD for Ascoidea, with 784 species and over 1300 scientific names. It is here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Ascoidea

Mite family Parholaspididae is a new GSD, with 164 species and almost 250 scientific names. It is here in ITIS: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Parasitiformes : Mesostigmata : Monogynaspida : Gamasina : Eviphidoidea : Parholaspididae

Other updates this month are either in groups COL gets elsewhere, or in existing ITIS GSDs.

yroskov commented 1 year ago

ITIS of 2023-04-26 imported 2023-04-28

DaveNicolson commented 1 year ago

Should be 493,239 species, it looks like the 3/30 data... Not sure what happened, I checked by downloading myself before posting the update above...

yroskov commented 1 year ago

Should be 493,239 species, it looks like the 3/30 data... Not sure what happened, I checked by downloading myself before posting the update above...

New conversion and import is in progress right now. Finally, all should be OK

yroskov commented 1 year ago

ITIS of 2023-04-26 re-imported 2023-05-01

Synced 2023-05-01

yroskov commented 1 year ago

Dear @DaveNicolson, we are few weeks away from 2023 Annual Checklist completion. Could you please have a look through ITIS data in May's edition (it is available at the PREVIEW website https://preview.catalogueoflife.org/, and soon will be deployed to the main portal). If you spot some problems, we still have a time to fix them before AC23. Thanks!

DaveNicolson commented 1 year ago

Thanks, Yuri! I added 2 missing ORCIDs in the ChecklistBank metadata here: https://www.checklistbank.org/dataset/2144/about

Can you please make sure those edits are included? I don't see any other issues, but we'll let you know if anything comes up. Thanks! -Dave

yroskov commented 1 year ago

Thanks! These corrections will be synced in AC23 together with ITIS update for June for sure. (It would be nice to get access to June's update between June 1st-13th).

DaveNicolson commented 1 year ago

A new version of ITIS is up, dated 25 May 2023 (I checked and the downloadable MySQLBulk file is current). Some relevant highlights for COL:

We finally completed the heteropteran infraorder Dipsocoromorpha (minute litter bugs, etc.), with 538 species and nearly 700 scientific names. I believe this was already identified as an ITIS GSD, as we were working toward completing it (delayed some by mite and other priority work, unfortunately). It is placed in ITIS here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Hexapoda : Insecta : Pterygota : Neoptera : Paraneoptera : Hemiptera : Heteroptera : Dipsocoromorpha

We completed the parasitic tapeworm family Taeniidae, which is placed within what has been treated as a GSD for Cestoda (but the subgroup Cyclophyllidea was NOT completed in WoRMS or ITIS). I honestly don't know what is best for COL, but the group is important, if small (60 species, about 130 scientific names). It is available if COL wants it, and is found in ITIS at: Animalia : Bilateria : Protostomia : Platyzoa : Platyhelminthes : Neodermata : Cestoda : Eucestoda : Cyclophyllidea : Taeniidae

We completed two more small superfamilies in the mite infraorder Anystina (Adamystoidea with 28 spp. & Pomerantzioidea with 6 spp.). They are found in ITIS here: Animalia : Bilateria : Protostomia : Ecdysozoa : Arthropoda : Chelicerata : Euchelicerata : Arachnida : Acariformes : Trombidiformes : Prostigmata : Anystina : Adamystoidea & Pomerantzioidea

We also corrected a bat vernacular issue noted recently by a user (sent directly to ITIS and sent via GBIF), and updated some groups that COL does not get from ITIS.

yroskov commented 1 year ago

ITIS of 2023-05-25 imported 2023-06-01

image

ISSUES assessed 2023-06-05

image

TASKS

image

2023-06-05:

image

Synced 2023-06-05

yroskov commented 1 year ago

Dear @dhobern, we need Taxonomy Group decision on a new offer from ITIS: family Taeniidae.

We completed the parasitic tapeworm family Taeniidae, which is placed within what has been treated as a GSD for Cestoda (but the subgroup Cyclophyllidea was NOT completed in WoRMS or ITIS). I honestly don't know what is best for COL, but the group is important, if small (60 species, about 130 scientific names). It is available if COL wants it, and is found in ITIS at: Animalia : Bilateria : Protostomia : Platyzoa : Platyhelminthes : Neodermata : Cestoda : Eucestoda : Cyclophyllidea : Taeniidae

Indeed, WoRMS Cestoda has only 2 spp in one genus Taenia, whereas ITIS has 60 spp in 4 genera. (As I can see, there are no clashes between additional 58 ITIS species in Taeniidae and members of other families in WoRMS Cestoda. However, absence of conflicts in Taeniidae taxonomy between ITIS & WoRMS Cestoda need to be confirmed).

DaveNicolson commented 1 year ago

The Cestoda list that WoRMS and ITIS both adopted some years back explicitly omitted Cyclophyllidea (they did provide a classification for it to genus, but stated that that subset was beyond their scope at that stage). So it is a global gap within the Cestoda "GSD", and there should be no conflict with that prior list as it wasn't covered in that work.

I'm not clear on how COL would handle a GSD within a different GSD, but that's basically what we're talking about.

yroskov commented 1 year ago

I'm not clear on how COL would handle a GSD within a different GSD, but that's basically what we're talking about.

It's possible as a "nested sector" (Lepidoptera, for example, has few nested sectors). It may work well, if IDs are stable, sector management and decisions in CLB behave as expected (otherwise, Taeniidae may jump into wrong parent taxon or be duplicated). However, reality is: as less nested (and normal) sectors CoL has, the Catalogue is more stable.

yroskov commented 1 year ago

@DaveNicolson, I have a draft of AC23 ready for your checks of ITIS data and credits at PREVIEW website: https://preview.catalogueoflife.org

Could you please also check my summary of new ITIS global checklists added in 10-months cycle since AC22: https://preview.catalogueoflife.org/data/metadata

ITIS: new global checklists for decapod suborder Dendrobranchiata and infraorders Anomura & Caridea; mite superfamilies Canestrinioidea, Hemisarcoptoidea, Histiostomatoidea, Schizoglyphoidea (order Sarcoptiformes), Adamystoidea & Pomerantzioidea (order Trombidiformes), Ascoidea and families Parholaspididae, Podocinidae (order Mesostigmata)

Please let me know, if facts or phrases need to be changed. (ITIS Taeniidae is not yet in the CoL).

DaveNicolson commented 1 year ago

That looks to be correct, assuming the change-over of GSDs authorized in the May 2023 TG meeting has not happened yet (?): "DECISION: The ITIS versions of Ologamasidae and Rhodacaridae will replace the existing GSDs for these families."

yroskov commented 1 year ago

"DECISION: The ITIS versions of Ologamasidae and Rhodacaridae will replace the existing GSDs for these families."

I was not informed about that decision. @dhobern ? @olafbanki ?

DaveNicolson commented 1 year ago

It was noted in "2023-05_COL-TG_meeting" notes...

dhobern commented 1 year ago

I sent an email on 4 May to inform you, but I can see that my contact list selected yuri.roskov@sp2000.org rather than yroskov@illinois.edu so I assume it went to an old address.

yroskov commented 1 year ago

I never used yuri.roskov@sp2000.org. In the past, all messages from that account were re-addressed to my main account. Seems, that setting become broken when I was cut off from all mailing lists and meeting notes.

yroskov commented 1 year ago
DaveNicolson commented 1 year ago

The new version of ITIS, dated June 28, 2023, is available for download. I verified that the MS SQL version shows that date, so you should be able to get the new version from https://www.itis.gov/downloads/index.html

I postponed this notice until some ITIS website issues were resolved (fingers crossed).

Two of the smaller phyla were updated (Phoronida and Placozoa), but COL doesn't currently use ITIS for those, or for the other ITIS updates this month.

However, a bunch of work was done on the animal hierarchy, and more is underway that we hope to have in ITIS in the next (late July) ITIS load. This month most animal groups now reflect (1) the published version to order of Ruggiero et al. from 2015 (the corrigenda version... we also removed many of the citations of the 2013 draft hierarchy in Animalia), (2) the classification(s) presented in Brusca et al. (2016, "Invertebrates", 3rd edition), and/or (3) several other works for specific taxonomic subgroups. More work remains to be done in e.g. Gastropoda that I'm still working on, and a few other groups (e.g., the neighborhood of Insecta, Hexapoda, the Vertebrata in large part, etc.). But it is a notable step forward...

Updates: The ITIS hierarchy can be seen in an overview by selecting the bottom rank of interest (to order, for example) from this page: https://www.itis.gov/hierarchy.html

gdower commented 1 year ago

Thanks, @DaveNicolson. The 2023-06-28 release is importing.

DaveNicolson commented 1 year ago

@gdower , we have just identified an issue with 3 TSNs in the hierarchy table (their placement was corrected after the generation of that table). Do you use that table in processing ITIS' data?

gdower commented 1 year ago

@DaveNicolson, yes, the converter uses the hierarchy table. Is that related to CatalogueOfLife/data#546 or a separate issue?

DaveNicolson commented 1 year ago

It is unrelated. There are 3 TSNs for whom the hierarchy table entries are either wrong or missing, due to corrections just after the hierarchy table was created. The affected TSNs are for Bradynectes (1037481), Myozona (1037524), and Microgloma (205524).

gdower commented 1 year ago

I can re-convert if you generate a new export.

DaveNicolson commented 1 year ago

I think they are doing that, but David Mitchell is discussing options with the IT team..... Each of those TSNs was inadvertently left with a broken parent link, so we fixed the parent link, but the hierarchy table doesn't show the fix. It led one ITIS user to report those 3 TSNs as having bad hierarchy links, but the actual taxonomic_units.parent_tsn links are there and working fine, just not if you look only at the hierarchy table. I'll let you know when we have a fix..... The hierarchy table is an "extra" or a "shortcut" that is not actually part of ITIS formally, its just there to help with processes that can't quickly walk the links record-by-record (which is the actual ITIS data). It's not in the Informix version of the export, which is the only one I use (I regenerate my own version of the hierarchy table), so I never noticed this potential issue with any last-minute fixes needed as we prepare the new monthly ITIS versions.

yroskov commented 1 year ago

ITIS of 2023-06-28 imported 2023-07-05 (10:19 PM) Imported version with a first iteration of fixes in conversion script for split genera due to incertae sedis species (https://github.com/CatalogueOfLife/data/issues/546)

phylum Acanthocephala = Subject re-established.

phylum Kamptozoa = now it is a synonym to two different phyla Cycliophora (former ITIS class in Kamptozoa) & Entoprocta (former ITIS class in Kamptozoa) image = In Assembly: sector "phylum Kamptozoa" deleted; "phylum Kamptozoa" deleted subtree; phyla Cycliophora & Entoprocta established as new sectors

phylum Micrognathozoa = Subject re-established.

phylum Sipuncula = Subject re-established.

image

Synced 2023-07-06

DaveNicolson commented 1 year ago

We expect the new exports of the ITIS data set to be available this evening, but time will tell... It is dated 5 July 2023. I will endeavor to let you know.

There were a lot of hierarchy changes under Animalia in ITIS in June, which is why the Kamptozoa (etc.) issues cropped up, I suppose. Thanks for catching those.