Open JoaoJandre opened 3 months ago
Hi @JoaoJandre, there is a similar functionality for VM snapshots without memory
Introduced in this PR
and this PR allows it for NFS/Local storage
It doesn't support VM snapshots for stopped VMs but I think it will be a small change
What I got from libvirt docs and a few forums is that using the flag VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE
is discouraged.
If flags includes VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, then the libvirt will attempt to use guest agent to freeze and thaw all file systems in use within domain OS. However, if the guest agent is not present, an error is thrown. Moreover, this flag requires VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY to be passed as well. For better control and error recovery users should invoke virDomainFSFreeze manually before taking the snapshot and then virDomainFSThaw to restore the VM rather than using VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE.
Probably you could leave the usage of virDomainFSFreeze/virDomainFSThaw to be executed by the value of the quiesceVm parameter and by the state of the VM (running/stopped).
Hi @JoaoJandre, there is a similar functionality for VM snapshots without memory Introduced in this PR and this PR allows it for NFS/Local storage It doesn't support VM snapshots for stopped VMs but I think it will be a small change What I got from libvirt docs and a few forums is that using the flag
VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE
is discouraged.If flags includes VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, then the libvirt will attempt to use guest agent to freeze and thaw all file systems in use within domain OS. However, if the guest agent is not present, an error is thrown. Moreover, this flag requires VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY to be passed as well. For better control and error recovery users should invoke virDomainFSFreeze manually before taking the snapshot and then virDomainFSThaw to restore the VM rather than using VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE.
Probably you could leave the usage of virDomainFSFreeze/virDomainFSThaw to be executed by the value of the quiesceVm parameter and by the state of the VM (running/stopped).
Hello, @slavkap
I'm aware of the current functionality, but I was not aware that it was made to support NFS/Local storage. Regardless, I have listed (in the spec) a few other issues with it:
In any case, this feature will only be used for NFS/SMP/Local storage, for the other types of storage (such as RBD or iSCSi), the implementation introduced in #3724 will still be used.
Regarding the domain freeze/thaw, the quote you posted says "For better control and error recovery users should invoke virDomainFSFreeze manually before taking the snapshot and then virDomainFSThaw to restore the VM rather than using VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE.", I saw the implementation of the freeze/thaw and there doesn't seem to be any error recovery attempt, so using it instead of the quiesce parameter does not seem any better. I'm not sure what type of error recovery we could do to be fair; so again, I don't see a point in using the freeze/thaw.
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
KVM, file storage (NFS, Shared mountpoint, local storage)
SUMMARY
This spec addresses an update to the disk-only VM snapshot feature on the KVM
1. Problem Description
Currently, using KVM as the hypervisor, CloudStack does not support disk-only snapshots of VMs with volumes in NFS or local storage, CloudStack also does not support VM snapshots for stopped VMs; this means that if the user needs some sort of snapshot of their volumes, they must use the volume snapshot/backup feature. Furthermore, the current implementation relies on the same workflows as volume snapshots/backups:
However, this approach is flawed: as we not only create the snapshots, but also copy all of them to another directory, there will be a lot of downtime, as the VM is frozen during this whole process. This downtime might be extremely long if the volumes are big.
Moreover, as the snapshots will be copied to another directory in the primary storage, the revert takes some time as we need to copy the snapshot back.
1.1 Basic Definitions
Here are some basic definitions that will be used throughout this spec:
2. Proposed Changes
To address the described problems, we propose to extend the VM snapshot feature on KVM to allow disk-only VM snapshots for NFS and local storage; other types of storage, such as shared-mount-point, already support disk-only VM snapshot. Furthermore, we intend to change the disk-only VM snapshot process for all other file-based storages (local, NFS and shared-mount-point):
quiesceVM
parameter istrue
.2.0.2. Limitations
virDomainSnapshotCreateXML
(the API used for this feature). With this in mind, allowing volume and disk-only VM snapshots to coexist would create edge cases for failure, for example: if the user has a snapshot policy, and thekvm.incremental.snapshot
is changed to true, the volume snapshots will suddenly begin to fail.2.1. Disk-only VM Snapshot Creation
The proposed disk-only VM snapshot creation workflow is summarized in the following diagram.
<img src="https://res.cloudinary.com/sc-clouds/image/upload/v1715878023/specs/cloudstack/disk-only-vm-snapshot/vm_snapshot_creation_2_tftl1d.png" alt="create-snapshot" style="width: 100%; height: auto;">
virDomainSnapshotCreateXML
API, informing all the VM's volumes with thesnapshot
key and theexternal
value, and using the flags:VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC
: to make the snapshot atomic across all the volumes;VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY
: to make the snapshot disk-only;VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA
: to tell Libvirt not to save any metadata for the snapshot. This flag will be informed because we do not need Libvirt to save any metadata, all the other processes regarding the VM snapshots will be done manually using qemu-img.VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE
: ifquiesceVM
istrue
, this flag will be informed as well to keep the VM frozen during the snapshot process, once the snapshot is done it will be already thawed.qemu-img create
for every volume of the VM, to create a delta on top of the current file, which will become our snapshot.Unlike the volume snapshots, the disk-only VM snapshots are not designed to be backups; thus, we will not copy the disk-only VM snapshots to another directory or storage. We want the disk-only snapshots to be fast to revert whenever needed, and keeping them in the volumes backing-chain is the best way to achieve this.
Currently, the VM is always frozen and resumed during the snapshot process, regardless of what is informed in the
quiesceVM
parameter. This process will be changed, the VM will only be frozen if thequiesceVM
is informed. Furthermore, the downtime of the proposed process will be orders of magnitude smaller then the current implementation, as there will not be any copy while the VM is frozen.During the VM snapshot process, the snapshot job is queued alongside the other VM jobs; therefore, we do not have to worry about the VM being stopped/started during the snapshot, as each job is processed sequentially for each given VM. Furthermore, after creating the VM snapshot, ACS already forbids detaching volumes from the VM, so we do not need to worry about this case as well.
2.2. VM Snapshot Reversion
The proposed disk-only VM snapshot restore process is summarized in the diagram below. The process will be repeated for all the VM's volumes.
<img src="https://res.cloudinary.com/sc-clouds/image/upload/v1715707430/specs/cloudstack/disk-only-vm-snapshot/vm_snapshot_reversion_1_iw5dej.png" alt="revert-snapshot" style="width: 100%; height: auto;">
The proposed process will allow us to go back and forth on snapshots if need be. Furthermore, this process will be much faster than reverting a volume snapshot, as the bottleneck here is deleting the top delta that will not be used anymore; which should be much faster than copying a volume snapshot from another storage and replacing the old volume.
The process done in step 2 was added to cover an edge case where dead snapshots would be left in the storage until the VM was expunged. Here's a simple example of why it's needed:
Current
being the current delta that is being written to,Snap 1
the parent ofCurrent
andSnap 2
. If we deleteSnap 1
, following the diagram on the snapshot deletion section, we can see that it will be marked as destroyed, but will not be deleted nor merged, as none of these operations can be done in this situation.<img src="https://res.cloudinary.com/sc-clouds/image/upload/v1715784150/specs/cloudstack/disk-only-vm-snapshot/Drawing_1_gsu9vr.png" style="width: 10%; display: block; margin-left: auto; margin-right: auto; height: auto;">
Snap 2
, thus the oldCurrent
will be deleted and a new delta will be created on top ofSnap 2
. The problem is that now we have a dead snapshot, that will not be removed by any other process, as the user will not see it, and none of the processes of the snapshot deletion will delete it:<img src="https://res.cloudinary.com/sc-clouds/image/upload/v1715784167/specs/cloudstack/disk-only-vm-snapshot/Drawing_2_vmmjhl.png" style="width: 5%; display: block; margin-left: auto; margin-right: auto; height: auto;">
Snap 1
andSnap 2
(usingqemu-img commit
) and be left with onlySnap 2
, and our current delta:<img src="https://res.cloudinary.com/sc-clouds/image/upload/v1715784171/specs/cloudstack/disk-only-vm-snapshot/Drawing_3_jlf7xx.png" alt="revert-snapshot-ex3" style="width: 5%; display: block; margin-left: auto; margin-right: auto; height: auto;">
2.3. VM Snapshot Deletion
In order to keep the snapshot tree consistent and with the least amount of dead nodes, the snapshot deletion process will always try to manipulate the snapshot tree to remove any unneeded nodes while keeping the ones that are still needed; even if they were removed by the user, in these cases, they'll be marked as deleted on the DB, but will remain on the storage primary until they can be merged with another snapshot. The diagram below summarizes the snapshot deletion process, this process will be repeated for all the VM's volumes:
<img src="https://res.cloudinary.com/sc-clouds/image/upload/v1715869488/specs/cloudstack/disk-only-vm-snapshot/vm_snapshot_deletion_6_iobpxf.png" alt="snapshot-deletion" style="width: 100%; height: auto;">
As this diagram has several branches, each branch will be explained separately:
virDomainBlockCommit
; else useqemu-img commit
, to commit the child to it;virDomainBlockCommit
API already does this for us;virDomainBlockCommit
; else useqemu-img commit
, to commit the sibling to the parent;virDomainBlockCommit
API already does this for us;The proposed deletion process leaves room for one edge case, which can lead to a dead node that would only be removed when the volume was deleted: If we revert to a snapshot that has one other child and then delete it, using the above algorithm, the deleted snapshot will end up only marked as removed on the DB. If we revert to another snapshot, this will leave a dead node on the tree that would not be removed (the snapshot that was previously deleted). To solve this edge case, when this specific situation happens, we will do as explained in the snapshot reversion section and merge the dead node with its child.
2.4. Template Creation from Volume
The current process of creating a template from a volume does not need to be changed. We already convert the volume when creating a template, so the volume's backing chain will be merged when creating a template.