gilbertchen / duplicacy

A new generation cloud backup tool
https://duplicacy.com
Other
5.14k stars 335 forks source link

Recommended way to backup both on local drive and cloud #360

Open Ithilion opened 6 years ago

Ithilion commented 6 years ago

I've been successfully using Duplicacy while backing up to a local removable drive (and restoring from it has already proven necessary during two catastrophic drive failures, many thanks!). I would like to up my game and add a third, off-site, copy of the data (as the standard practice dictates), on the cloud this time. Given that I have also access to an unlimited Google Drive storage, that's a no brainer too. I have two drives that I backup/purge on a daily schedule. Given that it would take several weeks (or even a couple of months) to upload all data to Google Drive with my internet connection, and I would like to be able to interrupt/resume the upload process and also continue backing-up/purging my drives locally on the same schedule as now, what would be the most efficient way to set this up?

jonreeves commented 6 years ago

The three ways that spring to mind to get your data to another storage are:

  1. Duplicacy Backups
  2. Duplicacy Copy
  3. Manual Sync/Transfer with 3rd Party Tool (RClone)

Speaking from experience and various conversations here, Copy is probably the best way to go. Creating a new Backup/Snapshot to a differently location means iterating through the Source Drive to Process, Hash and Encrypt all the Files again. Additionally it also means that Snapshot 1 in one Storage Destination is not the same as Snapshot 1 at another (you effectively have two different branches).

Copy on the other hand will let you Upload the already created Chunks from one Storage (local) to another (cloud). Doing so, means you don't have to redo any of the computations again, or even need access to the original Source of Data. This is especially useful for me because I have slower home line, but access to a faster one elsewhere. Many will argue that using a third party tool like RClone or RSync to push a copy of your local Storage to the cloud is the simplest option, but for me using Duplicacy Copy has proven to be more powerful for one simple reason. You can Pick and Choose which Snapshot ID to Copy, even which Tag or Revision. This means, you don't have to push your entire Backup to the cloud, only the pieces you want. It can save you money, or just allow you to keep a different frequency of updates offsite.

In terms of resumability. Copy wins out over Backup again. Files can change between backup jobs, and that can mean that the jobs change each time you run it. Some chunks can become orphened (but can be cleaned up later). With Copy, you are uploading 'known' already 'created' chunks, so its just a matter of resuming where you left off. Copy does skip chunks already uploaded.

The only time I might consider using Backup over Copy, would be if I really needed to use a different Chunksize setting for my cloud provider than the local storage. As far as Encryption and Passwords is concerned, Duplicacy will allow you to Copy between Storages that have different Passwords or even ones with or without Encryption.

Using a 3rd Party Tool, means you have to Sync everything (the whole Storage) to the cloud, and additionally it has to have the same Password and Encryption settings.

TowerBR commented 6 years ago

@jonreeves, very interesting your arguments, i had not thought of some of these aspects.

So if I understand you correctly, you're suggesting to set up the local_storage (init repository local_storage), set up another storage (add repository cloud_storage), and after back up to the local storage, execute a copy command (running from the original machine) to the storage cloud (copy -from local_storage -to cloud_storage). Would be that way?

jonreeves commented 6 years ago

@TowerBR I can't really take credit, as I said I picked up alot of this from other conversations (#333 and others I came across on https://duplicacy.com/issues).

What you describe is essentially what I've been doing so far (to a USB HDD). Although I only intended to do so during me 'seeding' stage, where I need to get the bulk of my files online. Once everything is up there, I can use my normal line to keep it upto date.

My long term plan would be the following....

  1. Backup to NAS regularly
  2. Copy from NAS to Cloud semi-regularly

You can think of the NAS as being equivalent to a Local Storage (although its not portable).

I had thought about using USB as an interim down the line (NAS -> USB -> Cloud), incase a Revision becomes quite big and I need to use a faster line again. The problem is that you need a big enough USB drive, and either have to keep it upto date, or end up copying almost everything each time to make that revision complete.

I haven't figured out a way to make this work yet. I basically need a way to copy only the difference between one revision and another. Knowing that this copy on its own is incomplete/partial, and then a way to upload that incomplete/partial copy to the cloud. Its a weird case, so I'm still trying to figure out a better way.

In short, the Local/USB route is good for an initial seed, or if your data is small enough to fit on a USB, and you want that to be a primary destination (to save you money/bandwidth in recovering from the cloud every time).

On a separate but related note... I actually have a interesting portable setup right now running Duplicacy from USB. I had to jump through some hoops to make this work the way I wanted though. One key was to use samba://../Backups/Destination to make the path relative, so that when you plug the USB drive in elsewhere its always referring to the same Storage.

TowerBR commented 6 years ago

I thought of one thing: if you run a prune on the local job (repository -> local storage), the chunks will be erased from the local storage, but the copy command will not delete them from the remote storage, right?

williamgs commented 6 years ago

@TowerBR that's correct, I recently added a footnote to this wiki page: https://github.com/gilbertchen/duplicacy/wiki/Back-up-to-multiple-storages#pruning

TowerBR commented 6 years ago

Then I would have to:

backup repository -> local_storage
copy local_storage -> cloud_storage
prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7 local_storage     (for example)
prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7 cloud_storage     (for example)

I think the copy command will keep the revisions between the local and the cloud aligned, even though I run the backup several times for each copy, right?

williamgs commented 6 years ago

Correct, the snapshot revisions will be the same as long as backup is only run for local_storage, you can copy as many places as you like. Duplicacy doesn't shuffle revision names/numbers around (like rsnapshot for example). Revision number 99 on local_storage means the same thing as revision number 99 on cloud_storage, even if they are pruned differently.