IMCR-Hackathon / Hackathon-Central-2018

Command center for IMCR Hackathon participants to share ideas, coordinate teams, develop projects and access all logistics information
3 stars 0 forks source link

prepare taxonomicCoverage EML element from taxonomic name list #2

Open twhiteaker opened 6 years ago

twhiteaker commented 6 years ago

Given this list (without the parentheses part):

Asteroidea (class) Astarte montagui (species) Lasaeidae (family) Nuculana sp. (genus) Nuculona (misspelled) Costelloleda (synonym for Nuculana)

I'd like a couple of tools:

  1. Query ITIS (or my database of choice) for valid names. The result is a table with the input name in one column and the valid name in the next column. Presumably I would take this back to the PI for their blessing to use the valid names.
  2. Given valid names, query ITIS for the hierarchy, and build the taxonomicCoverage EML element as described in the best practices document. We're probably in R at this point, using the EML assembly line. Maybe taxize can already do this. I don't know. I'm not an R guy. ....Tim looks at taxize...., ok yes I think taxize can help. Or maybe the assembly line already does this. Ramble ramble.
srearl commented 6 years ago

Hi @twhiteaker - this one may already have a solution. Have a look here and, particularly, here. Do either of those tools/approaches address what you need?

twhiteaker commented 6 years ago

It looks like they do!

adroghini commented 6 years ago

@twhiteaker We've been struggling with a lot of this at our center, too. I have used the taxize package in the past, but it can be very time-consuming in the absence of taxonomic serial numbers (usually the case), and can be sluggish depending on your Internet connection (though I know taxize can be used to query a local SQL database, I just haven't tried that option out yet). I'd be interested in exploring the solutions @srearl proposed to see if they address all of our challenges.