andrewchambers / bupstash

Easy and efficient encrypted backups.
https://bupstash.io
MIT License
886 stars 31 forks source link

s3 storage #75

Open andrewchambers opened 3 years ago

andrewchambers commented 3 years ago

This is something that has been a private WIP.

andrewchambers commented 3 years ago

I'm on my 3rd implementation of this now, I can never quite get it right. A big problem is I want to strongly resist pulling async into bupstash.

robbat2 commented 2 years ago

query for S3 backend design and implications to using Glacier/DEEP_ARCHIVE:

Is there strong separation between all data and metadata files in the storage engine?

The repo layout file at https://bupstash.io/doc/man/bupstash-repository.html doesn't make it clear if the tar content listing is in items/ or data/.

This would be crucial to make it possible to put the metadata into storage w/ low per-access costs & low latency, while pushing the data to much cheaper storage (if I need my backup restored, I can wait 12 hours for the S3 restoreObject command to complete)

andrewchambers commented 2 years ago

The content listing is stored in data/ , Splitting the tiers is something I have considered and may add in a future release, though s3 also supports automatic intelligent access tiers which are another alternative.

robbat2 commented 2 years ago

S3 Intelligent tiering ends up worst possible pricing for backup media w/ known workloads. It doesn't immediately put most content into the DEEP_ARCHIVE storage class where it could be.

Splitting the listing would absolutely be needed then since content listings are in data/. As an alternative, making it possible to have multiple repos which don't have all of the data: e.g. some local store that keeps only last 7 days, plus also the Glacier storage that has years of backups.