OpenTreeOfLife / germinator

miscellaneous scripts and data for concerns that span more than one of the Open Tree code repositories: integration tests, system statistics, etc.
BSD 2-Clause "Simplified" License
21 stars 7 forks source link

link supporting_studies ids to supported nodes in induced_subtree #125

Open arlin opened 7 years ago

arlin commented 7 years ago

Currently, induced_subtree will provide ids for studies supporting the topology of the returned tree. The implementation of this in the phylotastic portal last week elicited immediate enthusiasm. This could be an incredibly valuable entrez into the literature for researchers interested in specific nodes (or teachers demonstrating to students how the Tree of Life rests on supporting work), but it may be frustrating if there are multiple studies to chase down. For instance, the user might receive a tree with 25 internal nodes that lists 8 supporting studies. A scientific user who is actually interested in tracking down supporting studies will want to know which studies support which nodes.

The expert user who knows what she is doing can cut her query down to just 3 OTUs and this will make it easier to target a single node in order to find supporting studies.

However, it may be more convenient to modify induced_subtree to include informational links from study ids to the supported nodes or edges in the tree.

Currently the tree itself cannot be loaded with this information because induced_subtree returns Newick. If the tree is returned in nexson or arguson, which may be needed for other requested features (see #124), then the links can be included in the tree.

However, even without a new format, the links can be established by phyloreferences, e.g., ancestor_node(A, B) or ancestor_edge(A, B) always implicates a specific node or edge in a rooted tree containing A and B. So the linkages could be represented as a list_of triples of the form <study_id> supports <phyloreference>. The json return by induced_subtree would then include a "newick" element with the treestring, a "supporting_studies" list, and a "support_links" list with the triples that link studies to the tree via phyloreferences.