Implement backup / replication

geerlingguy commented 6 months ago

Right now my backups are taken manually, at the end of the week, by attaching a hard drive to one of my Linux boxes and running a script that dumps all the data off the ZFS shares down to the hard drive (I'm lucky it all fits on a 20TB HDD right now... that will not be the case for more than another year).

I would like to have an onsite replica, maybe with a server running in RAIDZ1 so I can get more capacity with less drive failure guarantees, but just to have it available.

Then I would like to have an offsite replica—more details on where and how later ;)

I haven't ever set up zfs replication before, but it looks like a couple options are zrepl and zrep?

I've also had offers from rsync.net to host a backup there, I may ask about that, mostly in the interest of trying out their service at some point.

My home setup currently backs up to Amazon Glacier Deep Archive. I could have a copy there too (for very little cost) using rclone, but I would love to see how ZFS snapshots + replication can work across geographies.

geerlingguy commented 4 months ago

And see: syncoid, looks like a very simple tool that can be configured quite deeply and run on cron! See Reddit example: https://www.reddit.com/r/zfs/comments/rsx78z/comment/hqpttmg/

You'd run this command on the Pi to pull a backup from the server (I believe this would back up the whole dataset including all snapshots by default?):

root@backup:~# syncoid -r root@truenas:tank/dataset tank/dataset

geerlingguy commented 4 months ago

Setting up snapshots with sanoid, before:

jgeerling@nas01:/hddpool/jupiter$ zfs list -t snapshot
no datasets available

And after:

jgeerling@nas01:/hddpool/jupiter$ zfs list -t snapshot
NAME                                                   USED  AVAIL     REFER  MOUNTPOINT
hddpool/jupiter@autosnap_2024-04-27_03:15:00_monthly     0B      -     11.2T  -
hddpool/jupiter@autosnap_2024-04-27_03:15:00_daily       0B      -     11.2T  -
hddpool/jupiter@autosnap_2024-04-27_03:15:00_hourly      0B      -     11.2T  -

Also... while I was writing up a sanoid role (geerlingguy.sanoid, to be submitted to Galaxy soon™), I found https://github.com/exterrestris/ansible-role-sanoid/blob/main/tasks/main.yaml, which seems to do most of what I'd like to do. Might switch to that role, might not, we'll see.

geerlingguy commented 4 months ago

I have everything going, but I'm having trouble getting the replication to occur—probably sudo / zfs command issue with the pi user:

pi@nas02:~ $ syncoid --sshkey=.ssh/id_rsa_zfs --recursive backup/jupiter pi@nas01.mmoffice.net:hddpool/jupiter
WARN: ZFS resume feature not available on source and target machine - sync will continue without resume support.
cannot open 'backup/jupiter': dataset does not exist
CRITICAL ERROR: no datasets found at /usr/sbin/syncoid line 155.

geerlingguy commented 4 months ago

lol it would be better if I tried syncing in the right direction, too:

pi@nas02:~ $ syncoid --sshkey=.ssh/id_rsa_zfs --recursive --no-privilege-elevation pi@nas01.mmoffice.net:hddpool/jupiter backup/jupiter
WARN: ZFS resume feature not available on target machine - sync will continue without resume support.
cannot create snapshots : permission denied
CRITICAL ERROR: ssh    -i .ssh/id_rsa_zfs -S /tmp/syncoid-pi@nas01.mmoffice.net-1714192573 pi@nas01.mmoffice.net  zfs snapshot ''"'"'hddpool/jupiter'"'"''@syncoid_nas02_2024-04-26:23:36:14-GMT-05:00
 failed: 256 at /usr/sbin/syncoid line 1415.

It looks like I also need to add some ZFS permissions...

And I published a sanoid role: https://galaxy.ansible.com/ui/standalone/roles/geerlingguy/sanoid/

geerlingguy commented 4 months ago

On the HL15:

jgeerling@nas01:/var/log$ sudo zfs allow -u pi send,hold,mount,snapshot,destroy hddpool
jgeerling@nas01:/var/log$ zfs allow hddpool
---- Permissions on hddpool ------------------------------------------
Local+Descendent permissions:
    user pi destroy,hold,mount,send,snapshot

On the Pi:

pi@nas02:~ $ sudo zfs allow -u pi compression,mountpoint,create,mount,receive,rollback,destroy ssdpool/backup
pi@nas02:~ $ zfs allow ssdpool/backup
---- Permissions on ssdpool/backup -----------------------------------
Local+Descendent permissions:
    user pi compression,create,destroy,mount,mountpoint,receive,rollback

IT'S WORKING!

pi@nas02:~ $ syncoid --sshkey=~/.ssh/id_rsa_zfs --recursive --no-privilege-elevation pi@nas01.mmoffice.net:hddpool/jupiter ssdpool/backup/jupiter
INFO: Sending oldest full snapshot hddpool/jupiter@autosnap_2024-04-27_03:15:00_monthly (~ 11457.6 GB) to new target filesystem:
1.68GiB 0:00:14 [ 112MiB/s] [>                                                                      ]  0% ETA 1:02:35:47

geerlingguy commented 4 months ago

Yay, looks like everything's running smoothly. I'll have to check on it next week once the initial 11TB sync is complete (right now around 120 MB/sec, estimate 1 day and 2 hours).

Moving the rest of the work to #10.

martin-schlossarek commented 4 months ago

I haven't ever set up zfs replication before, but it looks like a couple options are zrepl and zrep?

Although you have already decided to use sanoid/syncoid, I highly recommend zrepl!

I've been using proxmox + zfs + zrepl on two NAS boxes (prod/backup) for over a year now and it's rock solid. The big plus is the integrated prometheus exporter which makes monitoring backups really easy.

geerlingguy commented 4 months ago

@martin-schlossarek - Definitely worth a look then. Also, if it does a little easier job of synchronizing the removed snapshots, that would be nice. Syncoid does that in the latest version but that version isn't in the Debian repo yet :D

martin-schlossarek commented 4 months ago

zrepl has very powerful builtin prune policies. Basically, you define rules which snapshots should be kept. All other snapshots that do not match these rules will be pruned.

geerlingguy / arm-nas

Implement backup / replication #3