Closed fdschneider closed 7 years ago
As written in another comment in #3, I would suggest to have a look onto GBIF api. For me/BExIS it would be ok (and also more preferred) to use any other meaningful source and not a species-dataset from BExIS.
I wrote a function get_gbif_taxonomy()
which takes a vector of species names and returns a table of spell-checked and accepted species names (if a synonym was provided) according to GBIF backbone taxonomy.
The R script uses this function to fix taxon names and extract a taxonID.
I found the taxize R package which does fuzzy matching and checks for synonyms of provided trait names against a given taxonomic ontology.
The user can choose from a range of reference species lists (via their respective online APIs), e.g. encyclopedia of life or gbif. Fauna Europaea might be included into the package soon. It might be easier and more sustainable to apply this in the R-package and teach BExIS how to map it to it's internal IDs. In that case, I could make the script write the taxonID like a complete string (just like in the species list that Andreas has send around, e.g. urn:lsid:catalogueoflife.org:taxon:41a708d1-52c2-102c-b3cd-957176fb88b9:col20120124). Using taxize would be much easier than writing an own fuzzy matching logic and manage the remote access to a BExIS species list, and keeping this list up-to-date.
Does that make sense?