feature request: amazon cloud drive integration

chad3814 commented 9 years ago

There's a set of python command line tools (in conjunction with an app spot) to provide cli support for cloud drive here: yadayada/acd_cli@aed16fc9affbf40db98544105d28e7a21bc0313c I haven't used python in a few years (but have done a ton of AWS work in bash and node), is it straight forward to make a new backend store for cloud drive?

AmesCornish commented 9 years ago

Buttersink is segmented so the different backends are pretty independent. I think you'd essentially be re-writing the "S3Store.py" file for ACD. The main class derives from Store in Store.py, where most of the ~~documentation~~ comments are.

Unless ACD presents a "copy-on-write" interface, you'd probably be storing volume differences on ACD, the same way I do with S3. You wouldn't be able to browse the files directly, but you could sync to and from your local btrfs.

Out of curiosity, why not just use the S3 backend? S3 is inexpensive and might give you even more functionality than you'd get out of an ACD backend. If you want to "browse" your files and backups from S3, you can spin up a server in EC2, and sync to that -- which is very quick and essentially free.

Let me know if you need any pointers.

chad3814 commented 9 years ago

I have 4T in btrfs that I want to back up off site, that would cost it would cost $120/mo on S3. Even reduced redundancy it's still about $100/mo. I also have a prime account, so for $60/yr I can get unlimited storage.

Imagine if I tried to backup my entire 40T btrfs array :) (although I imagine Amazon would contact me if I tried storing 40T in ACD)

AmesCornish commented 9 years ago

I see your logic. Glacier would be cheaper than S3, but still more than "unlimited". Compression might help a bit too...

Ames

Ames Cornish ~ http://montebellopartners.com/ 650-533-0835 ~ ames@montebellopartners.com

On Wed, Jun 10, 2015 at 2:22 PM, chad3814 notifications@github.com wrote:

I have 4T in btrfs that I want to back up off site, that would cost it would cost $120/mo on S3. Even reduced redundancy it's still about $100/mo. I also have a prime account, so for $60/yr I can get unlimited storage.

Imagine if I tried to backup my entire 40T btrfs array :) (although I imagine Amazon would contact me if I tried storing 40T in ACD)

— Reply to this email directly or view it on GitHub https://github.com/AmesCornish/buttersink/issues/10#issuecomment-110918625 .

GrahamCobb commented 9 years ago

I am looking at a somewhat similar use case, although using Glacier instead of ACD. A couple of questions:

1) Does buttersink work (effectively) with Glacier? In particular, keeping enough information locally that it doesn't have to retrieve information back out of Glacier in order to work out how to do the sync (which snapshots to use for diffs, etc).

2) Is there a way to upload the initial snapshot using an out-of-band channel -- in particular by importing a physical disk? There is no way I can upload several TB of data over the Internet. I could, however, do a btrfs send to a file on an external disk and send that to Amazon to import into my glacier storage. My daily changes to the data are much more reasonable to upload as long as the base snapshot is already there.

AmesCornish commented 9 years ago

Graham,

1 - Theoretically buttersink should work effectively with S3 objects that have been migrated to Glacier. It does not need to read the snapshots back out of S3 to sync new snapshots up to it. If you try it, I would be very interested in any bug reports or feature requests for it to work better.

One note is that buttersink might get confused if the ".bs" files are in Glacier and not retrievable. They are very small and are not essential, however, so I would recommend either not archiving them in Glacier or just deleting them when you archive the associated snapshots.

2 - AWS does have an out-of-band service. IIRC, you send them media, and they copy and mount it in EBS. You could use this to send them your snapshots on a btrfs system, then mount that to an EC2 instance, and then use buttersink to sync from the virtual instance to S3. Later syncs could be done from a local machine. Again, I have not tried this, and would be interested in hearing whether/how that worked for you.

HTH!

Ames

Arzte commented 7 years ago

@GrahamCobb Did you ever have any luck using buttersink with glacier? I am also intrested in using it this way since it is much cheaper.

AmesCornish / buttersink

feature request: amazon cloud drive integration #10