OpenTreeOfLife / hackathon

A repo for the 2014 OpenTree / Arbor / HIP hackathon
8 stars 1 forks source link

implement a service to sample from a named taxon #15

Open arlin opened 10 years ago

arlin commented 10 years ago

Here is an accessory service that I think is needed in order to make widespread ToL re-use feasible: for a taxon T of interest to me, and a number N that is a manageable number for me (the end user), give me a list of up to N species from T. How many times do you want a tree just for a few dozen representative eukaryotes, e.g., model organisms or familiar organisms with names that folks will recognize? What if you want a tree for representatives of mammalian orders (there are ~26 orders and ~150 families) instead of all 4K mammal species? How many times do you just want a tree of things that you can find pics for on wikipedia? For many uses of phylogenies, including casual scientific uses, and uses for education or demonstration, getting a tree for ALL the species in a named taxon is NOT what you really want, and getting what you want requires a LOT of work. Here are some possible services to address this problem:

There may be lots of other ways to define the most important, interesting, popular, or well studied species. Most of these ideas involve interacting with some kind of taxonomy service, i.e., first you have to get the list of all species in taxon T. Obviously it's convenient to get the list from OpenTree's taxonomy, because this has the species actually included in the tree. Then you need to cross-reference that list with some other resources, such as the IUCN redlist, EoL, NCBI genomes, etc.

Niteloser commented 10 years ago

As far as I can tell, a first thing that needs doing with regards to this idea is implementing a service to retrieve all species of a given taxonomy. As Arlin rightly pointed out, it would make sense to use OpenTree for this, as then we can be sure the species produced are indeed present in OT.

Now as far as I can tell (and please correct me if I am wrong), OT has no service for returning all species in a taxonomy (e.g. mammals). So a starting point for this idea might be to extend OT with said functionality. Then starting from this list we can work on the sampling idea.

Could anyone comment on how best to do this? I was thinking of perhaps adding a new method to the OT API to do this, by querying the neo4j database directly for e.g. an IS_LEAF property. Later this method could be overridden/extended to do increasingly interesting things with sampled taxa.

Looking forward to more input on this.

josephwb commented 10 years ago

@Niteloser The tip descendant information is a property of the node in the graph, so this is easy to get. We will be refactoring the services shortly (tomorrow?), so I can give you more details then.

Niteloser commented 10 years ago

@josephwb Thanks for the info. Details would be most helpful.

gaurav commented 10 years ago

This would be particular useful for Wikipedia: it'd be great to be able to say something like "I need an image to illustrate genus Felis at https://en.wikipedia.org/wiki/Felis", which would produce a tree showing the species in Felis within the context of other groups within Felidae. However, such a tree couldn't come from the OpenTree synthesis tree, since it's not yet established as an authoritative source: instead, it would have to be pruned from an individual study tree so that that study could be directly cited.