girder / covalic

Application for hosting challenges
http://challenge.kitware.com
8 stars 3 forks source link

evaluate total cost of ownership for AWS assetstore implementations #45

Open mgrauer opened 9 years ago

mgrauer commented 9 years ago

Let's evaluate in a bit more detail all the attributes of the possible assetstore implementations, so we can see what effort is needed where, and provide better guidance for choosing an implementation when deploying in AWS.

S3/local(EBS)/gridFS

Here's a start of things to think about:

mgrauer commented 9 years ago

As I'm looking into the pricing more, it seems like S3 is going to be better than our current local filestore/EBS.

We are using 512 GB of EBS (non IOPS) now, which is ~$51.20 per month, plus another $50 to $100 for backups. We could lower this to use only what we need, but the maintenance involved (really no matter at what size our assetstore is) :

Plus the local assetstore doesn't have good scaling properties across app servers and has a maximum (though very large) size determined by the max number of EBS volumes you can mount X the max size of EBS volumes.

S3 would scale far better, in terms of working with multiple app servers/celery nodes, connecting up with CDN systems, and having uploads/downloads not add additional load to the app servers. S3 also allows us to pay for exactly what we use and not have to worry about backups (this is only the case of data being lost, not recovering deleted files, which is a different issue).

I believe the functionality that we lose with the S3 assetstore compared to the local assetstore is

Are there others that I'm missing?

Uploading from the Python client should be just a matter of implementation--I would be happy to implement this if @zachmullen and @cpatrick agree we should move to S3.

For the hashing, is it possible to have md5 be the hash algorithm tied to the S3 assetstore? If not, perhaps something like AWS lambda would be an option to compute the correct hash upon upload.

The downloading of zipped archives seems the most troublesome to work around.

S3 costs are $0.03 per GB-month, as opposed to $0.10 per GB-month for EBS, plus not having to pay for backups, which makes EBS more like $0.20-0.25 per GB-month. There are the per-request costs with S3, but my guess is that for most workloads this won't be much. Data transfer costs are roughly the same, but there is an additional $0.01 per GB for transferring in or out through a public IP in EC2.

cpatrick commented 9 years ago

:+1: for using S3.