Best practice for scheduling?

bassebaba commented 7 years ago

I really like this piece of software, but nowhere in the documentation it says anything about how to actually set up a good backup routine?

I mainly use this on Windows, I installed it in c:\program files\duplicacy\ But if I run the exe there, it cannot find any repos.... It seems that I have to explicitly run the .exe in every location where I initialised a repo in order to do "stuff" (?).

Is there no way to have a "master location" from where I can maintain all repos I have?
How do I keep an "active" backup of my repos? Do I need to use crontab/task scheduler to run the backup-command regularly? And I have to schedule the command to be run for every location where I have a repo? If so, is it viable to run it like every 15 minutes? Wont that create a lot of revisions?
It said something in the installer to install it as a "service", but I cant find any documentation about that?
Is there a way to get some sort of error reporting on failed transfers?
Common logging? How do I log the output from backup? Do I need to pipe the output to logfiles myself?
Lets say my computer/drives totally die. How do I restore? Since there's no "master" config, I guess I "manually" need to remember the setup of every repo (snapshot id, backup locations etc) in order to set it up again on the new harddrive/computer I want to restore on?

Are there any scripts available that takes care of "wrapping" these tasks?

I guess what I'm looking for is a "system" to use this more like a more "polished" software like the now game-over Crashplan.

gilbertchen commented 7 years ago

You'll need to cd to the directory to be backed up (the repository) and then run duplicacy init repository_id storage_url to initialize the repository first (assuming duplicacy.exe is in your path).

This command line version is designed to run interactively to perform a basic set of backup-related operations. A lot of features you want can be found in the GUI version available from https://duplicacy.com. Unfortunately the GUI version right now only supports one repository so you'll have to use symlinks to link in other directories if you have multiple directories to back up.

There is no need to backup the "master" config. To restore a repository on a different computer, you just run the same duplicacy init repository_id storage_url command under a new repository (the path of the new repository doesn't matter). The repository id can be found under the snapshots directory on the storage since it is just the name of the subdirectories where all snapshot files for this repository are stored.

jsreynolds commented 7 years ago

I just did a lot of experimenting last weekend on WIndows. A few things I did which might help.

Is there no way to have a "master location" from where I can maintain all repos I have?

Put the executable in a folder, then add it to your system path so that Duplicacy can be executed anywhere you wish. Under that folder, create preferences folders for each storage path you're going to back up.

For each source, change directory to the source root folder and perform the init, but pass the pref-dir option and point it back to one of the folders you created earlier under the executable directory. Apparently this has to be done in the command line and can't currently be done in the GUI... or so I think.

It will basically then only create a single file at the root of the storage folder which points to your pref-dir area; it is human-readable (and changeable).

In this manner, all your prefs, etc. are all stored in one location next to the executable.

How do I keep an "active" backup of my repos?

In Windows, you can use something as simple as the task scheduler... it's just fine for such scenarios. If you need something fancier, there are thousands of schedulers out there to choose from.

Do I need to use crontab/task scheduler to run the backup-command regularly?

Yes. Remember to consider executing as a user with local administrator rights in order to get the vss abilities.

If so, is it viable to run it like every 15 minutes? Wont that create a lot of revisions?

That entirely depends on how many files are being changed and / or added. If nothing has changed, then you won't have any additional overhead. Revisions without changes or additions don't take up any additional space.

One caution; you might want to first check to ensure no other Duplicacy processes are already running - otherwise they will happily compete with one another and you'll have three going on the same repo at the same time. If you're using the windows task scheduler there is a flag in there to not start another instance if the first one hasn't completed.

Common logging? How do I log the output from backup? Do I need to pipe the output to logfiles myself?

Side note: my power went out for an hour - nice of github to have saved my notes!

I'd like to know more about logging options myself. Other than piping the output to a file and then grepping the file, there doesn't seem to be a way to capture the errors cleanly that I'm aware of.

Lets say my computer/drives totally die....

I emulated that very thing. The good news is you simply select a new source folder on your machine, and init it to the storage, just like you did originally. Then simply perform a restore. Much simpler than I had imagined.

I hope this helps.

Best, --J

bassebaba commented 7 years ago

Thanks, I really appreciate the feedback. I'll try to write powershell/bash scripts for automation and error reporting, will post back if I have good success.

The pref-dir was what I was looking for. Am I out of luck now that I already init and backed up my repos?

gilbertchen commented 7 years ago

Just rm the .duplicacy directory and run the init command again with the -pref-dir option (remember to use the same repository id).

@jsreynolds that is in deed very helpful! For logging, the command line version simply prints logs to stdout and you'll have to redirect it to a log file. There is a global option -log which can add the timestamp and the message id to each log message.

bassebaba commented 7 years ago

So i tried the-pref-dir, but that didnt help much.

What I would like to do: d:\MasterDir d:\Repo1 e:\Repo2 f:\Repo3

I would then like to use the-pref-dir and place the 3 repos prefs in MasterDir. Then I could script, pseudo code:

foreach dir in MasterDir {
  cd $dir 
  duplicacy.exe backup....
  cd ..
}

Beacuse that's easy to script, one script to iterate over all repos. No matter how many repos I init in the future, the one sigle script will "find" them all.

The problem is this does not seem to work. Even tho I init with -pref-dirit seems like i still need to issue the commands from the Repo dir, not the pref dir...?

I guess the pref-dir is more about security, that you can keep the keyring/knownhosts etc away from the target filesystem..

Is there any (easy) way to add what I'm looking for, a reverse-lookup of the pref-dir -> repo dir? I.e the .duplicacy-file thats in the target repo dir that points to the pref dir, should instead be inside the pref dir and point to the backup location...

tbain98 commented 7 years ago

The not-yet-implemented #149 would give you part of what you're looking for, by allowing you to run the backup command from anywhere. That's not necessarily the same as having the pref-dir store the path to the repo dir (and I think that this request should be implemented, because I think it's far more natural to have the metadata point to the real data rather than the real data point to the metadata), but it might still be an improvement for you if it were implemented.

bassebaba commented 7 years ago

@tbain98 Yeah that issue is a bit towards what I want.

But my proposed solution also serves as "self documentation". I did run Crashplan on -multiple- locations, ranging from my laptop, workstation, servers, cloud servers, friends laptops, friends servers, you name it.

With the current system, the user needs to "remember" every repository in some way, either by scripting every repo or keeping some sort of documentation.

If I give you the login to a windows machine and tell you "please show me all duplicacy repos", how do you achieve that? Search for ".duplicacy"?

So with all metadata in one single location instead pointing to the repository directories, it work work "the same" on all installs. A command like "list-repositories" would show me a complete picture of the backup situation on the machine.

I guess that this is what the GUI aims to do tho .... :)

My powershell script turned out nicely. It logs every run, has a config file in which I can enter all the repo dirs & do various config (i.e backup threads and so on), Pushover/Email support for notifications. Ill try to clean up the code and upload it.

jwatte commented 7 years ago

@bassebaba Everything for Windows (from voidtools, free download) will find all folders named ".duplicacy" in half a second. Another option is to init your repository at C:\ and use exclude/include filters to select the paths you actually want to include (I use this method.) If you have multiple drives, on Windows, you have to either "subst" them in, or create multiple repositories. The command line version can be scripted with scheduled tasks.

stevenhorner commented 7 years ago

@bassebaba Any chance of uploading your Powershell script, saves me re-inventing the wheel for my Windows machines.

bassebaba commented 7 years ago

@stevenhorner I'm by no means a Powershell master, use it as a starting point :)

https://github.com/bassebaba/DuplicacyPowershell

gilbertchen / duplicacy

Best practice for scheduling? #171