deajan / backup-bench

Quick and dirty backup tool benchmark with reproducible results
BSD 3-Clause "New" or "Revised" License
112 stars 10 forks source link

Backup to cloud storage? #12

Open basldfalksjdf opened 1 year ago

basldfalksjdf commented 1 year ago

Would it be possible to do a test when backing up to cloud storage? As long as your local internet does not charge you for bandwidth, you can use Oracle Cloud Infrastructure as the target to get 10GB free storage, so that you don't have to pay for the tests, although Oracle may charge you for API calls. Scaleway offers free 75GB storage, which can be used too. Scaleway also does not charge for API calls.

I am happy to put some money in the pot to help fund this test.

andrewchambers commented 1 year ago

bupstash currently doesn't have cloud storage other than what is offered at https://bupstash.io/managed.html - though I don't mind crediting an account to do tests.

basldfalksjdf commented 1 year ago

bupstash can be excluded from cloud storage tests, in that case, or just use bupstash.io. Cloud storage tests to me seem important, as most of these programs are intended for backing up to the cloud (aside from bupstash and borg to a limited extent).

deajan commented 1 year ago

@basldfalksjdf Would a selfhosted minio S3 server do the job ?

deajan commented 1 year ago

AFAIK, bupstash and borg both only support their own server implementation over SSH. Indeed rclone makes it possible to replicate those repos into the cloud with various providers, but that's not really fair when it comes to speed benchmarking.

I also somehow assume that mounting an rclone target and creating a repo in such a mountpoint to simulate "local filesystem" backups could be quite slow.

@ThomasWaldmann Is my assumption above right ? If not, I'd mount a minio S3 server via rclone on the source server, and let borg backup into it @andrewchambers If the above statement doesn't work, I'd happily use a managed bupstash account.

basldfalksjdf commented 1 year ago

@basldfalksjdf Would a selfhosted minio S3 server do the job ?

My intention to ask is just because object storage works differently than local or SFTP, so some software may perform differently. I think a seflhosted minio will emulate object storage just fine.

ThomasWaldmann commented 1 year ago

borg only supports a directory as repo storage or a remote borg process reachable via ssh (the latter is offered by some providers like borgbase.com, hetzner storage box, rsync.net).

one can use all sorts of stuff for the "directory", including directories on network or cloud filesystems, but it is the users responsibility then to choose something that actually works reliably.

i don't use rclone myself, so i can't comment on that.

ntolia commented 1 year ago

Having previously benchmarked these systems (see this post on Restic vs. Kopia), self-hosted Minio doesn't always work because one can be limited by local IO bandwidth at times. Using a real scale-out storage system helps projects like Kopia that can efficiently utilize this.

I really do believe we should try and test systems against the best backend storage system they support. From many many years of experience building backup systems, I can tell you that SFTP/SSH sucks for this use case. Otherwise, we might unconsciously be testing a lowest common denominator or not actually testing the backup system but instead the backend storage system. I am not saying that these problems cannot be worked around but do need a lot more thought.

deajan commented 1 year ago

I already switched backend for restic and kopia, to their specific HTTPS implementation. For now, kopia's speed on HTTPS was worse than SFTP, I'm still investigating, see https://github.com/kopia/kopia/issues/2372

So basically I think I'll stick with what the author thinks is the best, being AFAIK:

The idea is to keep self hosted backends.

I can add optional S3 benchmarks for restic, kopia and duplicacy. I will be limited be indeed limited by my local hardware when using minio. Then again, once the script supports S3 storage buckets, I can redo the test with some cloud provider, as long as I don't pay for doing benchmarks ;)