OpenTreeOfLife / hackathon

A repo for the 2014 OpenTree / Arbor / HIP hackathon
8 stars 1 forks source link

From a list of species, return phylogeny (from OpenTree) and geographical records (from GBIF) #16

Open fmichonneau opened 10 years ago

fmichonneau commented 10 years ago

I am interested to develop an R package that would allow to interface with the open tree API (already mentioned in #14 by @dwinter, and in #3 by @arlin), so that the tree(s) and metadata can be manipulated, plotted, etc... by the phylogenetic packages available in R.

My understanding is that the open tree API returns a NexSON file.

peytol provides functions to convert betwen the NexSON and NeXML formats.

With the imminent release of RNeXML on CRAN (https://github.com/ropensci/RNeXML/, developed by @cboettig), a package that allows to parse and import NeXML files into R, it seems that the workflow should be:

R --> query open tree API --> API returns NexSON file --> use peytol to convert to NeXML --> import NeXML in R

Additionally, once the interface with the API is in place, we could use the open tree taxonomy resolver to obtain synonyms species names and query a range of other web services (e.g., GBIF, GenBank, iNaturalist, EOL, ...) to retrieve data associated with the species included in the phylogeny. For my research interests, I am most interested to first focus on GBIF but would be open to work any other service.

chinchliff commented 10 years ago

So, the format of the API results depends on what kind of trees you are querying for. There are two sources for trees: the treestore, which contains curated, annotated trees from the primary literature, and the graph of life, which contains synthetic tree structures constructed from trees in the treestore. In the future, the APIs for both tree sources should return at least NexSON and/or newick, but currently the graph of life only returns newick (wrapped in JSON). I think you can still use peyotl tools to convert newick -> NeXML if necessary, but @mtholder would definitely know the answer to this.

curtislisle commented 10 years ago

I like this thinking of how to construct a pipeline from a species list through to R-based analysis. If it is possible, please talk a bit with Luke Harmon early in the hackathon. You might find value in writing some of these R routines inside Arbor, which already implements methods to access the OpenTree API and other data sources. We are working on LifeMapper, currently, which offers curated GBIF species occurrences. Pardon me if I have mis-interpreted. I am a Comp. Sci. guy trying to understand phylogenetic biology. Occasionally I will make laughable mistakes.

curtislisle commented 10 years ago

Please check out the Arbor demonstration videos listed at the URL. Some of the machinery to do OpenTree extraction and matching with a trait matrix is already working inside of Arbor. Please try it out. There should be a public instance of Arbor running at (https://arbor.kitware.com) during the hackathon week for testing and experimentation.

https://github.com/arborworkflows/arborworkflows.github.com/wiki/Arbor-Demonstration-Videos

jcavner commented 10 years ago

@fmichonneau

Using rgbif would remain within the R world per your overall framework. In other words once you have the taxon resolved get the GBIF data that way. Lifemapper on the other hand would provide you a means of using Python to get GBIF occurrence sets. http://lifemapper.org/?page_id=338

Francois, I also understand from your email introduction that you are interested in biogeographic types of problems with phylogenies?

Jeff Cavner Lifemapper