OpenTreeOfLife / opentree

Opentree browsing and curation web site. For overarching or cross-repo concerns, please see the 'germinator' repo.
http://tree.opentreeoflife.org/
BSD 2-Clause "Simplified" License
108 stars 26 forks source link

Do we need to separate API server and the API for the tree.opentreeoflife.org use? #558

Open mtholder opened 9 years ago

mtholder commented 9 years ago

ok so an issue here is not the best way to bring this up (I know that we've discussed this elsewhere - couldn't find it).

As seen in my email earlier [1](followed by a quick retraction), it is not too hard to stress out the api.opentreeoflife.org to a point to which it becomes unresponsive. obviously, we can't fix all performance bugs in the short term, but my stressing the API meant seemed to correspond with an unwillingness for the tree to load on tree.opentreeoflife.org.

I'm a bit afraid of whether we'll stay up if we get many users....

[1]https://groups.google.com/forum/?fromgroups&hl=en#!topic/opentreeoflife-software/0ryUpmFg4fc

jar398 commented 9 years ago

Yep.

For your purposes we could bring devapi up to date and you could work there. Or we could make you a fresh server (I have two spares here at MIT).

We can replicate, but I don't know how much that will help - it just gets us small constant factors.

We had talked about requiring API keys. Maybe now is the time.

tree.opentreeoflife.org is getting hit frequently by a botnet (at least I think so; need to dig to verify this and haven't gotten around to it), and consequently api.opentreeoflife.org. API keys won't fix that problem, since we can't use them to throttle access to tree.opentreeoflife.org.

On Wed, Jan 21, 2015 at 10:48 AM, Mark T. Holder notifications@github.com wrote:

ok so an issue here is not the best way to bring this up (I know that we've discussed this elsewhere - couldn't find it).

As seen in my email earlier 1 http://followed%20by%20a%20quick%20retraction, it is not too hard to stress out the api.opentreeoflife.org to a point to which it becomes unresponsive. obviously, we can't fix all performance bugs in the short term, but my stressing the API meant seemed to correspond with an unwillingness for the tree to load on tree.opentreeoflife.org.

I'm a bit afraid of whether we'll stay up if we get many users....

[1] https://groups.google.com/forum/?fromgroups&hl=en#!topic/opentreeoflife-software/0ryUpmFg4fc

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/opentree/issues/558.

mtholder commented 9 years ago

Just playing around at this point, but this commit (https://github.com/OpenTreeOfLife/germinator/commit/956cc9bc75837c9d0df24609d42f5dfa9ccdd3b0 ) makes me optimistic about caching gzipped versions of the getSyntheticTree for the "default" call that is used by the tree browser. It would take considerable computation to fill the cache, but that could be parallelized. It looks like the storage requirement would be ~20G.

It maybe worth implementing a web-service that checks for the default args to the call (arguson, the correct tree ID, and a depth of 3), and uses the cache if the request matches. And act like a proxy for other calls.

jimallman commented 9 years ago

See the related issue #494 and the wiki page describing our initial caching solution. This does not currently zip responses, but we can probably add this via apache configuration. Nor does it pre-fill the cache, but relies on an initial caller to load the cache for a given clade and experience a delay. (Alternately, we could call the most popular clades via cURL to preload the cache.)

On the plus side, it will cache the arguson for any clade requested by a user, so any areas of common interest should be cached before long. If this becomes overwhelming for a RAM-based cache, we can easily switch to a disk cache instead.