SpeciesFileGroup / taxonworks

Workbench for biodiversity informatics.
http://taxonworks.org
Other
85 stars 25 forks source link

Import nomenclature from ITIS using modified Castor import #1949

Open teleaslamellatus opened 3 years ago

teleaslamellatus commented 3 years ago

MODIFIED CASTOR IMPORT FOR NOMENCLATURES WITH SYNONYMY

This is not a bug, more like a development question about importing nomenclature from ITIS. Also, I would really like to discuss this in person with someone.

Taxonworks has an excellent (I would say unparalleled) system for taxonomic nomenclature. Importing abilities, on the other hand, are extremely limited as there are no support imports from already existing systems, such as ITIS (which is actually getting better and better due to collaborations with other data aggregators.

Also, a system that would allow export/import nomenclature between different projects, is also missing (Why shouldn't we all use the awesome nomenclature from Texas A&M or Urbana?).

There is a cardinal difference in the way, DWC, Idigbio, GBIF (all ITIS based, I think) and Taxonworks (and systematists in general) handle nomenclature, I think. First of all, they don't use OTUs and they also consider new combinations equivalent to synonyms (for instance, they don't have an "original_parent_taxon" column). It seems to me that ITIS handles new combinations by adding parentheses to author names, that is, I think, a really poor way of managing new combinations.

Nevertheless, for the most part, it is possible to convert nomenclature from ITIS into a CASTOR (which, currently, also uses parentheses and not a separate column for marking new combinations), at least on the family or superfamily level (I would perhaps not import higher level hierarchy as it seems to be a bit funky in ITIS, for instance). Examples can be found in this google sheet (the first tab is CASTOR template I received from Matt, second is an original ITIS table and third is my converted ITIS to CASTOR attempt.)

Steps of converting ITIS to modified CASTOR (CASTOR+original_taxon_name are the following:

A. rename columns (taxonID to id; acceptedNameUsageID to related_name_id; parentNameUsageID to parent_id; scientificName to taxon_name; scientificNameAuthorship to author_year; taxonRank to rank)

B. Create the new column 'original_parent_taxon'

C. Move items from the column 'specificEpithet' to taxon_name

C. If specificEpithet is equal between two rows (A and B), and one row (B) has the ID of the other (A) as related_name_id, add value of genus in B to the original_parent_taxon of A and delete B (This will get rid of new combinations as rows).

D. If 'parent_id' is empty and 'rank' is 'species' find 'id' for row that matches the value from column 'genus' and add to 'parent_id'. (please note here that ITIS don't give an ITIS ID for invalid genera, so in our example, neither Tabanus nor Ricardoa have ITIS ID-s (I assigned them 1 and 2, subsequenly).

D. Erase columns that are not used (not named in point A + original_parent_taxon)

Comments

  1. ITIS also has a column "intraspecific epithet" in our example, this column did not have any input, but I think, in the Taxonworks Universe, we don't need this column as long as rank contains "subspecies" and "varietas" etc.

  2. GUID appears as one of the columns in the CASTOR template. Which one would Taxonworks use of the followings as GUID:

Wikidata: Q729, Wikispecies: Animalia, ADW: Animalia, EoL: 1, EPPO: 1ANIMK, Fauna Europaea: 1Fauna Europaea (new): dada6f44-b7b5-4c0a-9f32-980f54b02c36, Fossilworks: 325038, GBIF: 1, iNaturalist: 1, ITIS: 202423, NZOR: f38e12bf-0be7-4f13-b739-e2bc1b763ae, 0uBio: 230572, WoRMS: 2, ZooBank: 0EA9A33B-6B31-4551-B4E2-A772AAF96231

teleaslamellatus commented 3 years ago

I tried to upload tab-separated versions of the tables in this document (the last one is without original_taxon_name) and I think I do something wrong as Taxonworks gives back the error for most rows Parent is not selected; Parent The parent should not be empty (only one root is allowed per project).

Screen Shot 2020-12-15 at 4 26 03 PM
mjy commented 3 years ago

The _nomen realted fields need NOMEN classes in them, not the values you had.

Regardless- the preview needs work, you can ignore the parent warning and try importing anyway (in a sandbox).

mjy commented 3 years ago

This now contains documentation but not particulary requests for code changes. Content should migrate to taxonworks_doc then issue closed.

debpaul commented 1 year ago

@mjy @teleaslamellatus is the above documentation still accurate? Would like to know before adding it to docs.taxonworks.org OR into the embedded help inside TW Castor help.

mjy commented 1 year ago

Don't import at the moment- needs cleanup, plus work ongoing by @teleaslamellatus.