EDIorg / EMLassemblyline

R package for creating EML metadata
https://ediorg.github.io/EMLassemblyline/
MIT License
30 stars 13 forks source link

taxonomy no longer supports sub-specific designations in scientific names #137

Open RobLBaker opened 1 year ago

RobLBaker commented 1 year ago

It looks like when EAL is attempting to generate taxonomic coverage, it fails to do so in some cases if the scientific name is specified beyond the species level - varieties, subspecies, etc. Individual searches for the taxa at itis.gov indicate that they are valid taxa. Previously, this seemed to work fine but recently it has been producing a few different outputs. For example,

subsp., ssp, or var:

<taxonomicClassification>
     <taxonRankValue>Python bivittatus ssp. bivittatus</taxonRankValue>
</taxonomicClassification>

Rather than the full taxonomic hierarchy. Although previously unnecessary, removing the designation for the sub-specific component of the scientific name (i.e. ssp, subsp, or var) and supplying just 3 words seems to resolve the problem in most cases. However, occasionally there are valid scientific names that still result in abbreviated taxonomic coverage that lacks the full taxonomic hierarchy (could this last case be a ITIS-related, perhaps a connection timeout?). For example,

<taxonomicClassification>
     <taxonRankValue>Cercocarpus ledifolius intricatus</taxonRankValue>
     <taxonId provider="itis"/>
</taxonomicClassification>

For scientific names that do contain subsp, ssp, or var It looks like when the template_taxonomic_coverage() function is called, it fails to produce a taxonomic authority ID for taxa specified to this level (instead taxonomic_coverage.txt has an 'NA'). My best guess is that this traces back to r/taxonomyClearnr, or perhaps it also involves ITIS, which does not seem to list subsp or spp as part of a scientific name when a subspecies designation is present, but does include "var" in a scientific name when a variety designation is present.

clnsmth commented 1 year ago

Thanks for this detailed report @RobLBaker.

I can look into this further, but will need the ITIS identifiers for the two examples listed above. Would you mind posting those here?

RobLBaker commented 1 year ago

Thanks, and definitely! Python bivittatus bivittatus TSN 1094085 Cercocarpus ledifolius var. intricatus TSN 195850

I think those are two good examples of subspecies and varieties, but there are more if you'd like.