OpenTreeOfLife / germinator

miscellaneous scripts and data for concerns that span more than one of the Open Tree code repositories: integration tests, system statistics, etc.
BSD 2-Clause "Simplified" License
21 stars 7 forks source link

Move files.opentreeoflife.org to S3 #126

Open jar398 opened 7 years ago

jar398 commented 7 years ago

The site is currently on a surplus server (varela.csail.mit.edu) that works well for the time being and is free, but if it fails in any way, recovery is not guaranteed.

I estimated S3 hosting costs at the highest service level and it comes to about $ $.28 per month for 12G.

Information on hosting web sites on S3: https://aws.amazon.com/blogs/aws/host-your-static-website-on-amazon-s3/

S3 is not the only choice, but we already have an Amazon account so it should be straightforward.

jar398 commented 7 years ago

It looks like s3-hosted web sites do not support directory indexes. I find the indexes to be quite useful because they allow one to browse to see what's available on the server. E.g.

http://files.opentreeoflife.org/ott/ http://files.opentreeoflife.org/synthesis/ http://files.opentreeoflife.org/ncbi/ http://files.opentreeoflife.org/fung/ etc.

One might use this to answer questions like:

Going to s3 would require either foregoing this feature, or implementing it ourselves via some kind of script (that would write index.html files, etc.). The latter sounds fragile to me, but maybe it could be made to work.

jar398 commented 7 years ago

Another issue is that there is no server for running 'tar' on. You have to unpack locally and then send the files using 'aws s3 sync' or equivalent. For synthesis version 8.0 this is more than 50,000 files. I have no idea how long that would take. Maybe I will try it later today...

jar398 commented 7 years ago

Also, there seems to be no way to make symbolic links, breaking our 'current' links for synth and taxonomy (they would have to be maintained on a different server)

jar398 commented 7 years ago

spoke too soon, as usual. http://docs.aws.amazon.com/AmazonS3/latest/dev/how-to-page-redirect.html

jar398 commented 7 years ago

spoke too soon about listings, too. https://github.com/rufuspollock/s3-bucket-listing http://opentreeoflife-files.s3-website-us-west-2.amazonaws.com/ott/ott3.0/ This solution requires putting a special index.html in every directory.

jar398 commented 7 years ago

S3 pricing: https://aws.amazon.com/s3/pricing/ Data transfer is $.01 per Gb, so basically nothing in our case (the site is 11G, so 11 cents to grab the whole site, all versions of everything). There is a per-request fee of $0.005 per 1,000 PUT requests (less for GET). To transfer the 50,000 files belonging to a single synthesis release (if we decide to continue hosting the output/ trees on the files site) would be about 25 cents. We are not talking large numbers.

This is a simplification - prices vary depending on various plan parameters. But I don't think the numbers would be much higher than this.