PetaByet / cdp

Simple & Open Source Server Backups
https://cdp.me
GNU General Public License v2.0
123 stars 39 forks source link

Backup replication #17

Open alexandreteles opened 9 years ago

alexandreteles commented 9 years ago

I know that this is at your ToDo list for CDP, so I want to present you a really good way to make the backup replication using Unison ( http://www.cis.upenn.edu/~bcpierce/unison/ ). Unison is a tool that is available for Windows and present at the repositories of many flavors of Unix (Solaris, Linux, OS X, etc.) systems.

So, I will consider that you already have Unison installed (or know how to do this).

First of all you need to setup the UNISON environment variable at path to make sure that UNISON will look for *.prf (profile) files at the correct location, most preferable one that isn't world readable. So we will create the unison directory under /etc/ and add the UNISON variable to the env.:

# mkdir -p /etc/unison/
# touch /etc/profile.d/unison.sh
# echo '#!/usr/bin/env bash' >> /etc/profile.d/unison.sh && echo 'UNISON=/etc/unison/' >> /etc/profile.d/unison.sh
# chmod +x /etc/profile.d/unison.sh

After that, we need to create a *.prf file to tell unison what will we ask him to do. This files are pretty simple to build and can make unison replicate the backups for more that one server at once. Let me show you a example. First of all we will create the .prf file inside of the /etc/unison/ directory:

# We will set here all directories that we will use
root = /var/www/html/files/my_server_backup_dir
root = ssh://user@my-other-server.info//directory/at/other/server
root = ssh://user@my-third-server.info//directory/at/other/server

# Obviously, to use SSH without passwords you need to pass the SSH keys to unison and any other SSH argument needed to make a connection to your remote servers.
# If you have a passphrase at your key, you will need to setup ssh-agent or keychain to provide it.
sshargs = -i /home/user/.ssh/id_rsa -p 2222

# As we want just one-way mirroring from this to the other servers, specify the source replica using "force" as follows.
# Note that the directory that you write here should be present as a root above.:
force = /var/www/html/files/my_server_backup_dir

# We want Unison to run without any user input so we will use "batch" mode.
batch = true

# We don't want to be prompted and will just accept Unison's recommendation:
auto = true

# We will make the times (but not directory modtimes) be propagated.
times = true

Save this file as /etc/unison/myfirstserver.com.prf and now you can sync the files at this directory to all server pointed at the *.prf using:

unison myfirstserver.com

This should provide a exit just like that:

Contacting server...
Connected [//local//home/alice/sync_folder -> //remote_host//home/alice/sync_folder]
Looking for changes
    Waiting for changes from server
Reconciling changes
new file -->            document1.pdf
    <-- new file   my.jpg  
Propagating updates
UNISON 2.40.63 started propagating changes at 21:19:13.65 on 20 Sep 2013
[BGN] Copying document1.pdf from /var/www/html/files/my_server_backup_dir to my-other-server.info//directory/at/other/server
[BGN] Copying my.jpg from //remote_host//home/alice/sync_folder to /home/alice/sync_folder
[END] Copying my.jpg
[END] Copying document1.pdf
UNISON 2.40.63 finished propagating changes at 21:19:13.68 on 20 Sep 2013
Saving synchronizer state
Synchronization complete at 21:19:13  (2 items transferred, 0 skipped, 0 failed)

Unison will sync the file permissions and owner:group by default. So, if you are transfering files that will be put at a protected directory, the user that you are using to login at the SSH needs to have the right permissions. You can use all this options directly from a big command line, but this isn't really recomended.

This could be used to make the backup restauration as well. You just need to change the root options to match what you need.

PetaByet commented 9 years ago

Does this have any benefits compared to the current tar gzip / sftp stack?

alexandreteles commented 9 years ago

As you are suposed to use this only for data replication (copy the backup files to a bunch of servers) Unison can provide some benefits over that stack:

  1. Unison is faster than sftp;
  2. As Unison can merge files, when using incremental backup Unison can send to the server only what was modified (will require Unison to be installed at all machines);
  3. Unison can be made faster when transfering very large files (10GB+);
  4. Unison is error resilient. If the transfering is interrupted, Unison will verify what was sent and send only what is needed to have the file completed on the other server (will require Unison to be installed at all machines);
  5. When transferring more than one file, Unison will automatically parallelize the transferring to achieve a faster transference.

For the points that require Unison to be installed at all machines, if it isn't installed at the machines that will receive the backup files, Unison will fallback and work with a less efficient way that don't provide that functionalities.

PetaByet commented 9 years ago

if it isn't installed at the machines that will receive the backup files, Unison will fallback and work with a less efficient way that don't provide that functionalities.

Which is?

alexandreteles commented 9 years ago

As Unison can merge files, when using incremental backup Unison can send to the server only what was modified (will require Unison to be installed at all machines);

In this case Unison will send the entire file and overwrite the old copy at the destination server instead of merging files;

Unison is error resilient. If the transfering is interrupted, Unison will verify what was sent and send only what is needed to have the file completed on the other server (will require Unison to be installed at all machines);

In this case, Unison will delete the remote incomplete copy and start all again. If you are using Unison in such a scenario, you should add to the profile how many retries it should do:

retry = 10

The retry argument can be used in a scenario where you have Unison installed on the others machines too, but if you don't have, it's mandatory to set this argument in the profile.

alexandreteles commented 9 years ago

When transferring more than one file, Unison will automatically parallelize the transferring to achieve a faster transference.

It will parallelize if you are transferring for more than one server too, even if you have just one file being synchronized.