Put backups files in a shared bucket

ergonlogic commented 9 years ago

AWS has limits to the number of buckets an organization can have (soft limit 50, hard limit 150). Considering how many backups get created in normal Aegir operations, we'll inevitably bump up against these limits.

While each site should still have a separate bucket, we can use namespaced folders in a single common backup bucket per client.

ergonlogic commented 9 years ago

We might want to support namespaced folders within buckets for production sites too: http://blogs.aws.amazon.com/security/post/Tx1P2T3LFXXCNB5/Writing-IAM-policies-Grant-access-to-user-specific-folders-in-an-Amazon-S3-bucke

ergonlogic commented 9 years ago

As a first step, let's look at one bucket per site, with both production and backups in the same bucket. Otherwise, there are security considerations that have to be taken into account to keep different site from having access to each others' files. The IAM policies required for that would presumably have to also be managed by Aegir, which would, in turn, require more privileges for the Aegir user. Let's avoid that issue for the time-being.

ergonlogic commented 9 years ago

The S3FS module provides a "Root Folder" field, which should simplify things significantly. Not sure about amazon_s3, and since we aren't using it in production, I think we may want to withdraw support until such time as we can maintain it responsibly.

We'll also need to account for converting a site from using the root of the bucket to a directory.

ergonlogic commented 9 years ago

One way we might accomplish this would be to create a directory in the bucket root named after the site (i.e., <site_name>), along with <site_name>-backups. Maybe <site_name>-site would be better... To convert an existing bucket that contains files at the bucket root, we could simply move any files and directories under the newly-created <site_name>-site.

We'll need to confirm that '.', '-' and other valid URL elements are valid in bucket directory names. They should be as these are just keys, and I don't recall any limitations on file names. Still, worth checking...

ergonlogic commented 9 years ago

Generally, we don't delete site backups when we delete a site. So, we'd have to only delete the <site_name>-site directory, and not the bucket itself when we delete a site. So this'd still lead to an accumulation of buckets... Perhaps we should provide an option to delete the bucket and all backups too. As it stands, Aegir will backup the site prior to deleting it, too...

ergonlogic commented 9 years ago

While I'd entertained the thought of keeping the existing functionality in parallel with this planned change, I don't know how realistic this'll be. I suspect that I'll just branch off from here, to keep things stable, and see about re-integrating the current functionality if there's any call for it.

ergonlogic commented 9 years ago

http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html states that keys (filename/path) can be up to 1024 characters long. Since most gnu/linux filesystems limit filenames to 255 characters (http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html), we should be fine here.

Also, only characters with special meaning in paths ('?', '\', etc.) or that look like an attempt to hack ('<', '>', etc.) need special care. Since these aren't normally allowed in filenames anyway, we should be safe on this front.

ergonlogic commented 9 years ago

As part of the current mechanisms, we end up with too many buckets, eventually running up against AWS' limits. To manually delete a bucket, I've run the following command: drush @hm eval "\$s3 = new Provision_Service_s3; \$s3->delete_bucket('examplecom');" I think it'd be worthwhile to wrap this in a utility Drush command. We could then do some validation as well, such as ensuring the bucket to delete isn't being used in a current site.

A similar utility Drush command might be the way forward in converting sites using the existing bucket root into the new system.

ergonlogic commented 9 years ago

Perhaps we should have 2 buckets per site, one for file contents, and the other for backups. This'd simplify then moving to having a single bucket for backups (per user or per client, perhaps) across multiple sites. Then, we could still just delete the site bucket when we delete the site, and not have to worry about keeping it around for the backups it contains.

ergonlogic commented 9 years ago

The more I think about a separate bucket for backups the more it makes sense. If we save a backup bucket name when credentials are added, we needn't create the bucket itself until it's needed. That said, it'd probably be best to create that backup bucket regardless, to avoid possible issues down the road. For now the Drush command to delete buckets should be sufficient for cleaning up such backup buckets.

ergonlogic commented 9 years ago

Backup buckets would be linked to credentials, and so map to either the client, user or site. This'd mean, at most, 2 buckets per site, if each site uses its own credentials. But, when used with client or user credentials, we'd only have overhead of one backup bucket for all a user's or client's sites. Since, in this model, we delete site buckets upon site deletion, we shouldn't be accumulating an undue number of buckets.

MatthewHager commented 9 years ago

I think we should do one "bucket" per client or aegir instance. For us this would be the same since we only use one client in aegir. I'm ok with one bucket for backups and another for files. Beyond that, I don't see a reason to have more than 2 total buckets.

It seems like we need to do more with IAM roles. If you need true separation, you could generate IAM users to match clients and then only give access to files in "bucketname/clientname/*". If those IAM creds were used, this would isolate access.

GetValkyrie / hosting_s3

Put backups files in a shared bucket #5