s3 storage - Githubissues

andrewchambers / bupstash

Easy and efficient encrypted backups.

https://bupstash.io

MIT License

886 stars 31 forks source link

s3 storage #75

Open andrewchambers opened 3 years ago

andrewchambers commented 3 years ago

This is something that has been a private WIP.

Performance so far is good on some s3 providers, absolutely horrible on others.
The fix seems like it will be extremely deep parallel fetch pipelining.
Want something that we can provide as a service on bupstash.io.
Want to allow people to run it themselves if they have their own cloud setup.

andrewchambers commented 3 years ago

I'm on my 3rd implementation of this now, I can never quite get it right. A big problem is I want to strongly resist pulling async into bupstash.

robbat2 commented 2 years ago

query for S3 backend design and implications to using Glacier/DEEP_ARCHIVE:

Is there strong separation between all data and metadata files in the storage engine?

The repo layout file at https://bupstash.io/doc/man/bupstash-repository.html doesn't make it clear if the tar content listing is in items/ or data/.

This would be crucial to make it possible to put the metadata into storage w/ low per-access costs & low latency, while pushing the data to much cheaper storage (if I need my backup restored, I can wait 12 hours for the S3 restoreObject command to complete)

andrewchambers commented 2 years ago

The content listing is stored in data/ , Splitting the tiers is something I have considered and may add in a future release, though s3 also supports automatic intelligent access tiers which are another alternative.

robbat2 commented 2 years ago

S3 Intelligent tiering ends up worst possible pricing for backup media w/ known workloads. It doesn't immediately put most content into the DEEP_ARCHIVE storage class where it could be.

Splitting the listing would absolutely be needed then since content listings are in data/. As an alternative, making it possible to have multiple repos which don't have all of the data: e.g. some local store that keeps only last 7 days, plus also the Glacier storage that has years of backups.