geerlingguy / arm-nas

Arm NAS configuration with ZFS.
GNU General Public License v3.0
108 stars 6 forks source link

Implement backup / replication #3

Closed geerlingguy closed 4 months ago

geerlingguy commented 6 months ago

Right now my backups are taken manually, at the end of the week, by attaching a hard drive to one of my Linux boxes and running a script that dumps all the data off the ZFS shares down to the hard drive (I'm lucky it all fits on a 20TB HDD right now... that will not be the case for more than another year).

I would like to have an onsite replica, maybe with a server running in RAIDZ1 so I can get more capacity with less drive failure guarantees, but just to have it available.

Then I would like to have an offsite replica—more details on where and how later ;)

I haven't ever set up zfs replication before, but it looks like a couple options are zrepl and zrep?

I've also had offers from rsync.net to host a backup there, I may ask about that, mostly in the interest of trying out their service at some point.

My home setup currently backs up to Amazon Glacier Deep Archive. I could have a copy there too (for very little cost) using rclone, but I would love to see how ZFS snapshots + replication can work across geographies.

geerlingguy commented 4 months ago

See also: Klara - OpenZFS Data Replication - Replicating Data Quickly and Safely.

geerlingguy commented 4 months ago

And see: syncoid, looks like a very simple tool that can be configured quite deeply and run on cron! See Reddit example: https://www.reddit.com/r/zfs/comments/rsx78z/comment/hqpttmg/

You'd run this command on the Pi to pull a backup from the server (I believe this would back up the whole dataset including all snapshots by default?):

root@backup:~# syncoid -r root@truenas:tank/dataset tank/dataset

See also: https://opensource.com/life/16/7/sanoid

geerlingguy commented 4 months ago

Setting up snapshots with sanoid, before:

jgeerling@nas01:/hddpool/jupiter$ zfs list -t snapshot
no datasets available

And after:

jgeerling@nas01:/hddpool/jupiter$ zfs list -t snapshot
NAME                                                   USED  AVAIL     REFER  MOUNTPOINT
hddpool/jupiter@autosnap_2024-04-27_03:15:00_monthly     0B      -     11.2T  -
hddpool/jupiter@autosnap_2024-04-27_03:15:00_daily       0B      -     11.2T  -
hddpool/jupiter@autosnap_2024-04-27_03:15:00_hourly      0B      -     11.2T  -

Also... while I was writing up a sanoid role (geerlingguy.sanoid, to be submitted to Galaxy soon™), I found https://github.com/exterrestris/ansible-role-sanoid/blob/main/tasks/main.yaml, which seems to do most of what I'd like to do. Might switch to that role, might not, we'll see.

geerlingguy commented 4 months ago

I have everything going, but I'm having trouble getting the replication to occur—probably sudo / zfs command issue with the pi user:

pi@nas02:~ $ syncoid --sshkey=.ssh/id_rsa_zfs --recursive backup/jupiter pi@nas01.mmoffice.net:hddpool/jupiter
WARN: ZFS resume feature not available on source and target machine - sync will continue without resume support.
cannot open 'backup/jupiter': dataset does not exist
CRITICAL ERROR: no datasets found at /usr/sbin/syncoid line 155.
geerlingguy commented 4 months ago

lol it would be better if I tried syncing in the right direction, too:

pi@nas02:~ $ syncoid --sshkey=.ssh/id_rsa_zfs --recursive --no-privilege-elevation pi@nas01.mmoffice.net:hddpool/jupiter backup/jupiter
WARN: ZFS resume feature not available on target machine - sync will continue without resume support.
cannot create snapshots : permission denied
CRITICAL ERROR: ssh    -i .ssh/id_rsa_zfs -S /tmp/syncoid-pi@nas01.mmoffice.net-1714192573 pi@nas01.mmoffice.net  zfs snapshot ''"'"'hddpool/jupiter'"'"''@syncoid_nas02_2024-04-26:23:36:14-GMT-05:00
 failed: 256 at /usr/sbin/syncoid line 1415.

It looks like I also need to add some ZFS permissions...

And I published a sanoid role: https://galaxy.ansible.com/ui/standalone/roles/geerlingguy/sanoid/

geerlingguy commented 4 months ago

On the HL15:

jgeerling@nas01:/var/log$ sudo zfs allow -u pi send,hold,mount,snapshot,destroy hddpool
jgeerling@nas01:/var/log$ zfs allow hddpool
---- Permissions on hddpool ------------------------------------------
Local+Descendent permissions:
    user pi destroy,hold,mount,send,snapshot

On the Pi:

pi@nas02:~ $ sudo zfs allow -u pi compression,mountpoint,create,mount,receive,rollback,destroy ssdpool/backup
pi@nas02:~ $ zfs allow ssdpool/backup
---- Permissions on ssdpool/backup -----------------------------------
Local+Descendent permissions:
    user pi compression,create,destroy,mount,mountpoint,receive,rollback

IT'S WORKING!

pi@nas02:~ $ syncoid --sshkey=~/.ssh/id_rsa_zfs --recursive --no-privilege-elevation pi@nas01.mmoffice.net:hddpool/jupiter ssdpool/backup/jupiter
INFO: Sending oldest full snapshot hddpool/jupiter@autosnap_2024-04-27_03:15:00_monthly (~ 11457.6 GB) to new target filesystem:
1.68GiB 0:00:14 [ 112MiB/s] [>                                                                      ]  0% ETA 1:02:35:47
geerlingguy commented 4 months ago

Yay, looks like everything's running smoothly. I'll have to check on it next week once the initial 11TB sync is complete (right now around 120 MB/sec, estimate 1 day and 2 hours).

Moving the rest of the work to #10.

martin-schlossarek commented 4 months ago

I haven't ever set up zfs replication before, but it looks like a couple options are zrepl and zrep?

Although you have already decided to use sanoid/syncoid, I highly recommend zrepl!

I've been using proxmox + zfs + zrepl on two NAS boxes (prod/backup) for over a year now and it's rock solid. The big plus is the integrated prometheus exporter which makes monitoring backups really easy.

geerlingguy commented 4 months ago

@martin-schlossarek - Definitely worth a look then. Also, if it does a little easier job of synchronizing the removed snapshots, that would be nice. Syncoid does that in the latest version but that version isn't in the Debian repo yet :D

martin-schlossarek commented 4 months ago

zrepl has very powerful builtin prune policies. Basically, you define rules which snapshots should be kept. All other snapshots that do not match these rules will be pruned.