aruhier / virt-backup

Backup your kvm guests managed by libvirt
Other
106 stars 23 forks source link

Full grown remote backup solution using virt-backup with host-side dedup & compression #37

Open deajan opened 4 years ago

deajan commented 4 years ago

Hello Anthony,

I've been searching for a good KVM backup solution like the holy graal, without success. Even a big commercial solution turned out to be a basic "virsh snaphot and copy" script in disguise with a polished GUI, but without compression, deduplication, or even good WAN support. The solution even executed virsh command lines instead of using libvirt directly, making it more error prone.

Anyway, I came to the conclusion that there aren't a lot of good options (the options being vProtect, Acronis, SEP and Netbackup, the latter ones being quite pricy).

A couple of months ago I reviewed your solution as a potential problem solver, the only missing part being deduplication and WAN support.

Fast forward, I think a complete remote friendly KVM backup solution can be achieved by combining your virt-backup solution with borg backup, which would handle onsite dedup and compression before sending the backups through the network. It also supports resuming sending files on errors, and of course backup retention. Other similar backups solutions include duplicity, bup, burp etc, but borg has the advantage of being written in Python, hence being interfacable with virt-backup without too much hassle.

Using the directory packager without any retention, virt-backup would only lack the ability to have pre and post backup hooks in order to achieve this.

What would you think of such a solution ? Also, I guess your use case for virt-backup might include remote storage ? What do you use to move your virt-backup generated files ?

Best regards.

Btw, congratz for having coded such a nice tool. Code also seems quite polished, even if some comments would have been nice for other devs to easier produce PRs. Anyway, very nice tool.

shunghsiyu commented 4 years ago

A couple of months ago I reviewed your solution as a potential problem solver, the only missing part being deduplication and WAN support.

I came to the same conclusion after a fair bit of search, really want to thank Anthony for this tool!

What would you think of such a solution ? Also, I guess your use case for virt-backup might include remote storage ? What do you use to move your virt-backup generated files ?

While I might not be the person most fit to answer these questions, I do like to chime on this issue.

For my use case (at my last job) I have a NFS server on LAN, and the a backup folder exported by the NFS server is mounted on the VM host running virt-backup under /backup/, where virt-backup store the backup files. This way no extra storage space is required on the VM host and I can be sure that the files are safe even when the VM host.

However no de-duplication is done for my setup. So I can only keep a few backups for each VM before the NFS server run out of space.

For other remote targets such as SSH, S3, GCS, B2, etc. there may be FUSE counters parts (e.g. sshfs, s3-fs) that can be used instead.

Again, these does not take de-duplication into account.

aruhier commented 4 years ago

Hey @deajan and @shunghsiyu, Sorry for my late answer. Thanks a lot for your interest in virt-backup, that's really great to see!

The thing about virt-backup is that I use it for my personal infrastructure, with just 2 hypervisors (which are not in cluster). The issue with what you want is that my needs are pretty simple, so I avoided to write an overkill solution for that.

However, now that I've split how the storage is handled, it's pretty easy to add a new way to store the backups: https://github.com/aruhier/virt-backup/tree/master/virt_backup/backups/packagers It just needs to expose a few API calls. Now for S3 for example, I don't have anything to easily test it, and I don't have the need for that, so I don't think I will write it myself (knowing that a fuse mountpoint can still be used as a target). But if someone wants to add it, I can review and merge.

A packager needs to implement to these abstract classes: https://github.com/aruhier/virt-backup/blob/master/virt_backup/backups/packagers/__init__.py#L30 Again, I should write some explicit documentation for that, but the packagers are split in a Read and Write class (for more safety and to avoid modifying a backup when no writing is needed). This one is pretty simple: https://github.com/aruhier/virt-backup/blob/master/virt_backup/backups/packagers/directory.py Also, all long operations (remove and/or add) should be stoppable via an event.

Also, I guess your use case for virt-backup might include remote storage ? What do you use to move your virt-backup generated files ?

Indeed, I use glusterfs as target. Well, my disks are stored locally, but my backups are on glusterfs. I just use a fuse mountpoint as a target for virt-backup.

About the deduplication + compression, for some targets I started to rely on my filesystem to do it, instead of handling it via virt-backup. I don't know if I can use borg backups to store the backups. It seems to be a good idea to reuse their deduplication stack, compression and every storage targets they already cover. However I'm a bit worried that I couldn't just simply "add" borg as a packager and make it a mandatory dependency of virt-backup. And that it would make virt-backup way less flexible.

And handling the deduplication myself… well I could, at least per domain, but it would need a good rework of the packagers (again :p). I couldn't keep one archive file per backup, would need to move to a structure that allows the backups to be modifiable without rewriting multiple GB to remove a backup, etc. Well, what borg is already doing doing in fact. And borg has the advantage to be more mature.

What I can do on the other hand is a "relay" packager: a packager that does nothing but calling some given commands to correctly responds to the API. It could be configured like that from the config file:

  packager_options:
    list: /opt/foo.sh list 
    remove: /opt/remove.sh
    add: /opt/foo.sh add
    get: /opt/foo.sh get

This scripts could return something in json that the packager could parse. This way you could for example rely on virt-backup to handle the external snapshots and everything, and instead of handling the copy itself, it would trigger the backup of a specific image. Then once it's done, it could continue as normal. Now I don't know if it would really help for what you need, compared to directly write in virt-backup the packagers you need. And also, for borg, I don't really know how it would handle the locks…

xx25 commented 4 years ago

@aruhier, thank you for the most advanced KVM backup solution available and also for your detailed explanation about internal virt-backup structure. I am also in the situation where I have to use borg to backup multiple VMs with backup encryption and de-duplication and writing "universal" packager looks like the right thing to me: borg usage could be very specific in every case and there is nothing actually could be integrated into virt-backup, despite the fact that they both written in python.