OpenTreeOfLife / hackathon

A repo for the 2014 OpenTree / Arbor / HIP hackathon
8 stars 1 forks source link

getSyntheticTree params for whole tree? #26

Open mjy opened 10 years ago

mjy commented 10 years ago

If I want the whole topology, every node and edge (an no other metadata). Is there a single call that can be used to return this? I see /treemachine/v1/getSyntheticTree is the way to do this, but it looks like that will be hamstrung in the next version, and I don't see (perhaps for obvious reasons) a parameter set that gets me the whole thing?

Could this be perhaps be pre-computed on every build?

Use case: visualization, I'd like to play with software like cytoscape and others.

chinchliff commented 10 years ago

It should already be hamstrung (by a limit on the number of nodes) to prevent people from bringing down the server by requesting multi-million tip traversals. I think we are saving a newick version of the tree though that is available for direct download... @josephwb and/or @blackrim should know more about that.

On Friday, September 5, 2014, Matt notifications@github.com wrote:

If I want the whole topology, every node and edge (an no other metadata). Is there a single call that can be used to return this? I see /treemachine/v1/getSyntheticTree is the way to do this, but it looks like that will be hamstrung in the next version, and I don't see (perhaps for obvious reasons) a parameter set that gets me the whole thing?

Could this be perhaps be pre-computed on every build?

Use case: visualization, I'd like to play with software like cytoscape and others.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/hackathon/issues/26.

josephwb commented 10 years ago

Hmm, I don't know what we have planned for this. @kcranston? @jar398? @blackrim?

jar398 commented 10 years ago

If you go to the developer resources page http://devtree.opentreeoflife.org/about/developer-resources you'll see a link 'newicks of the synthetic tree http://files.opentreeoflife.org/trees'. The plan is to keep dumps of the tree in that location. There is a tree there, the one from April. I think that's the same as what's currently on the production site (although since we don't have any versioning infrastructure for the tree it's hard to know - just added https://github.com/OpenTreeOfLife/treemachine/issues/113).

Jonathan

On Fri, Sep 5, 2014 at 10:59 AM, Joseph W. Brown notifications@github.com wrote:

Hmm, I don't know what we have planned for this. @kcranston https://github.com/kcranston? @jar398 https://github.com/jar398? @blackrim https://github.com/blackrim?

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/hackathon/issues/26#issuecomment-54636075 .

gaurav commented 10 years ago

In theory, I guess you could use /v2/tree_of_life/about to get the root_node_id, and then feed that into /v2/tree_of_life/subtree (according to https://github.com/OpenTreeOfLife/hackathon/issues/26). In practice, it sounds like you'll never be able to download that large a tree from the database.

josephwb commented 10 years ago

@gaurav right: we currently limit this to trees <= 25,000 tips. Happy to entertain alternatives.

bomeara commented 10 years ago

You could do caching: if you want a tree of available angiosperms, the API returns info about where it will be. As resources permit, create that subtree and store it. The next time someone else requests that tree, the API looks in the cache and if it has that subtree already for that overall OToL snapshot, just returns it, and otherwise queues up the subtree extraction as before. Seems clunky, but you will already have different API behavior for trees of 24,999 vs 25,001 tips, and at least this clunkiness eventually serves all users. I expect there will be a few nodes that will keep getting requested (think of the uses of Phylomatic for plants) and this will let you serve those folks well.

chinchliff commented 10 years ago

That's a good idea, but the implementation isn't immediately clear and serving large trees through the API isn't a priority right now. I think @jar398 has thought about this problem though and may be cooking a solution in his head. In any event, large subtrees are easily possible, just not through the API. We encourage users to download the entire tree in newick form if they need something of that size--it is available from files.opentreeoflife.org.

On Monday, September 15, 2014, bomeara notifications@github.com wrote:

You could do caching: if you want a tree of available angiosperms, the API returns info about where it will be. As resources permit, create that subtree and store it. The next time someone else requests that tree, the API looks in the cache and if it has that subtree already for that overall OToL snapshot, just returns it, and otherwise queues up the subtree extraction as before. Seems clunky, but you will already have different API behavior for trees of 24,999 vs 25,001 tips, and at least this clunkiness eventually serves all users. I expect there will be a few nodes that will keep getting requested (think of the uses of Phylomatic for plants) and this will let you serve those folks well.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/hackathon/issues/26#issuecomment-55645182 .