abbbi / virtnbdbackup

Backup utility for Libvirt / qemu / kvm supporting incremental and differential backups + instant recovery (agentless).
http://libvirtbackup.grinser.de/
GNU General Public License v3.0
340 stars 46 forks source link

Feature/PR proposal: Merge incremental backups into full (synthetic full backup) #149

Open draggeta opened 11 months ago

draggeta commented 11 months ago

I don't know if this is possible at all, but it would be nice if incrementals could be merged back into a full backup.

For our use case, a month of backups is enough. Being able to have a rolling full backup where the oldest incremental is merged with the full would save on disk space for two months of data.

Again, not something we require, but it would be cool.

abbbi commented 11 months ago

I think it might be easier to implement if #57 is solved first, then you Could rebase the images as required using regular qemu utils. Currently i don't intend to implement such a feature with the used backup format.

draggeta commented 11 months ago

Hmmm, that seems interesting. I know python well enough but I'm a noob when it comes to KVM/QEMU. I'll try to look into the other issue.

abbbi commented 11 months ago

In issue #100 something similar was discussed. Using virtnbdrestore to create an "synthetic full backup" which serves as base for further incremental backups. With the current stream based backup format its a bit more complicated as if the qcow based backup format would be in place (https://github.com/abbbi/virtnbdbackup/issues/57)

Basically the workflow with the current backup format could look like:

The --create-synthetic-full function would then:

1) merge the content of all backup .data files from the current backup directory into a new .data full backup file (and also merge the compression trailers accordingly, remove non-required meta headers and so on) 2) copy the latest vm config/kernel/initrd files into the new target directory

From this point on the user can schedule the next backup runs:

and continue with incremental backups into this new folder, remove the old folder to free up some disk space. This way users with very big disks don't have to create a new time consuming full backup.

It could also considered that --create-synthetic-full operates on the same directory (does an in-place merge to the existing full backup file) but then, if something goes wrong during this non-atomic operation you're likely to nuke your current full backup.

I understand the point in having synthetic full backups if your full backup takes very long and is an pain in the ass. If its merely for saving disk space, i would consider using an backup storage filesystem which supports deduplication. This way you can store more full backups without running out of disk space too soon (having multiple full backups is always a good idea anyway, see #134)