abbbi / virtnbdbackup

Backup utility for Libvirt / qemu / kvm supporting incremental and differential backups + instant recovery (agentless).
http://libvirtbackup.grinser.de/
GNU General Public License v3.0
330 stars 46 forks source link

Fails to restore missing checkpoints after the 11th backup #17

Closed adrianparilli closed 3 years ago

adrianparilli commented 3 years ago

Hello there,

Under a server crash event (with multiple VMs running,) I've found it's able to recover missing checkpoints correctly. For example:

INFO common - printVersion: Version: 0.23 Arguments: /usr/local/bin/virtnbdbackup -d Server1 -l inc -o /mnt/user/Backup/vm-backups/Server1 --compress INFO virtnbdbackup - main: Store checkpoints in: /mnt/user/Backup/vm-backups/Server1/checkpoints WARNING libvirthelper - getDomainDisks: Raw disk hdb excluded by default, use option --raw to include. INFO virtnbdbackup - main: Domain has 1 disks attached which support changed block tracking. INFO virtnbdbackup - main: Attempting to redefine checkpoints from: /mnt/user/Backup/vm-backups/Server1/checkpoints INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Server1/checkpoints INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.0 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.1 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.2 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.3 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.4 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.5 INFO virtnbdbackup - main: Looking for checkpoints INFO virtnbdbackup - main: Found checkpoints, next checkpoint id: 6 INFO virtnbdbackup - main: Parent checkpoint name virtnbdbackup.5 INFO virtnbdbackup - main: Using checkpoint name: virtnbdbackup.6 INFO virtnbdbackup - main: Temporary scratch file target directory: /var/tmp INFO virtnbdbackup - main: Starting backup job. INFO libvirthelper - fsFreeze: Freeze filesystems. INFO libvirthelper - fsThaw: Thawed filesystems. INFO virtnbdbackup - main: Started backup job with checkpoint, saving information. INFO libvirthelper - backupCheckpoint: Saving checkpoint config to /mnt/user/Backup/vm-backups/Server1/checkpoints/virtnbdbackup.6.xml INFO virtnbdbackup - backupDisk: INC backup: set context to qemu:dirty-bitmap:backup-hdc INFO nbdhelper - version: 1.6.1 INFO virtnbdbackup - backupDisk: NDB Endpoint socket: /var/tmp/virtnbdbackup.1 INFO nbdhelper - getBlockInfo: Using Maximum Block size supported by nbd server: 33554432 INFO virtnbdbackup - backupDisk: Using nbd to query extents INFO virtnbdbackup - backupDisk: Got 12965 extents INFO virtnbdbackup - backupDisk: 214748364800 bytes disk size INFO virtnbdbackup - backupDisk: 1349189632 bytes of data extents to backup INFO virtnbdbackup - backupDisk: Write data to target file: /mnt/user/Backup/vm-backups/Server1/hdc.inc.virtnbdbackup.6.data.partial INFO virtnbdbackup - backupDisk: Creating thin provisioned stream backup image INFO virtnbdbackup - backupDisk: Compression enabled INFO virtnbdbackup - backupDisk: Backup of disk hdc finished, file: /mnt/user/Backup/vm-backups/Server1/hdc.inc.virtnbdbackup.6.data INFO virtnbdbackup - backupConfig: Saving VM config to: /mnt/user/Backup/vm-backups/Server1/vmconfig.virtnbdbackup.6.xml INFO virtnbdbackup - main: Stopping backup task INFO virtnbdbackup - main: Finished

In this particular case, sudden incremental backups didn't show missing checkpoints again.

But in other cases I found that chance to recover missing checkpoints is zero when the 11th checkpoint (virtnbdbackup.10) exists, failing almost invariably in this way:

INFO common - printVersion: Version: 0.23 Arguments: /usr/local/bin/virtnbdbackup -d Server2 -l inc -o /mnt/user/Backup/vm-backups/Server2 --compress INFO virtnbdbackup - main: Store checkpoints in: /mnt/user/Backup/vm-backups/Server2/checkpoints WARNING libvirthelper - getDomainDisks: Raw disk hdb excluded by default, use option --raw to include. INFO virtnbdbackup - main: Domain has 1 disks attached which support changed block tracking. INFO virtnbdbackup - main: Attempting to redefine checkpoints from: /mnt/user/Backup/vm-backups/Server2/checkpoints INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Server2/checkpoints INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.0 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.1 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.10 ERROR libvirthelper - redefineCheckpoints: Unable to redefine checkpoint virtnbdbackup.10: invalid argument: parent virtnbdbackup.9 for moment virtnbdbackup.10 not found WARNING virtnbdbackup - main: Unable to redefine checkpoints INFO virtnbdbackup - main: Looking for checkpoints INFO virtnbdbackup - main: Found checkpoints, next checkpoint id: 11 INFO virtnbdbackup - main: Parent checkpoint name virtnbdbackup.10 INFO virtnbdbackup - main: Using checkpoint name: virtnbdbackup.11 INFO virtnbdbackup - main: Temporary scratch file target directory: /var/tmp INFO virtnbdbackup - main: Starting backup job. INFO libvirthelper - fsFreeze: Freeze filesystems. INFO libvirthelper - fsThaw: Thawed filesystems. ERROR virtnbdbackup - main: invalid argument: missing or broken bitmap 'virtnbdbackup.10' for disk 'hdc' Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/virtnbdbackup-0.23-py3.9.egg/EGG-INFO/scripts/virtnbdbackup", line 271, in main File "/usr/local/lib/python3.9/dist-packages/virtnbdbackup-0.23-py3.9.egg/libvirtnbdbackup/libvirthelper/libvirthelper.py", line 260, in startBackup domObj.backupBegin(backupXml, checkpointXml) File "/usr/lib/python3/dist-packages/libvirt.py", line 797, in backupBegin raise libvirtError('virDomainBackupBegin() failed') libvirt.libvirtError: invalid argument: missing or broken bitmap 'virtnbdbackup.10' for disk 'hdc'

Where the sequence:

INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.0 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.1 INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.10 ERROR libvirthelper - redefineCheckpoints: Unable to redefine checkpoint virtnbdbackup.10: invalid argument: parent virtnbdbackup.9 for moment virtnbdbackup.10 not found

Repeats invariably, regardless the number of checkpoints and if the backup chain grew up without issues to a high number (e.g. virtnbdbackup.35.) It's like if it gets stuck because it's restoring them in a bad sequence.

virsh --version Compiled against library: libvirt 6.5.0 Using library: libvirt 6.5.0 Using API: QEMU 6.5.0 Running hypervisor: QEMU 5.1.0


I'm not good enough at python like to check the code and point out, but as I've been dealing with libvirt for a while, this looks similar as how 'virsh checkpoint-list ' throws by default:

Name Creation Time ----------------------------------------------- virtnbdbackup.0 2021-09-02 21:14:57 +0200 virtnbdbackup.1 2021-09-02 21:31:31 +0200 virtnbdbackup.10 2021-09-21 03:00:18 +0200 virtnbdbackup.11 2021-09-22 03:00:18 +0200 virtnbdbackup.2 2021-09-03 18:37:05 +0200 virtnbdbackup.3 2021-09-04 04:40:04 +0200 virtnbdbackup.4 2021-09-05 04:40:09 +0200 virtnbdbackup.5 2021-09-06 04:40:14 +0200 virtnbdbackup.6 2021-09-14 04:40:29 +0200 virtnbdbackup.7 2021-09-15 03:00:19 +0200 virtnbdbackup.8 2021-09-16 03:00:29 +0200 virtnbdbackup.9 2021-09-20 03:00:22 +0200

If this is related with the issue, '--topological' flag shows a correct sequence, where no child is listed before its ancestors (https://libvirt.org/manpages/virsh.html#id261)

Name Creation Time ----------------------------------------------- virtnbdbackup.0 2021-09-02 21:14:57 +0200 virtnbdbackup.1 2021-09-02 21:31:31 +0200 virtnbdbackup.2 2021-09-03 18:37:05 +0200 virtnbdbackup.3 2021-09-04 04:40:04 +0200 virtnbdbackup.4 2021-09-05 04:40:09 +0200 virtnbdbackup.5 2021-09-06 04:40:14 +0200 virtnbdbackup.6 2021-09-14 04:40:29 +0200 virtnbdbackup.7 2021-09-15 03:00:19 +0200 virtnbdbackup.8 2021-09-16 03:00:29 +0200 virtnbdbackup.9 2021-09-20 03:00:22 +0200 virtnbdbackup.10 2021-09-21 03:00:18 +0200 virtnbdbackup.11 2021-09-22 03:00:18 +0200

Along with the flag '--name', generates a clean list of checkpoints as they should be (apparently) restored.

Thanks for reading. Open to help with further info as much as I can.

abbbi commented 3 years ago

hi,

looks like a sorting issue, if more than 10 checkpoints exist! Seems as it starts with virtnbdbackup.1 and the sorting algorithm continues with virtnbdbackup.10 instead of virtnbdbackup.2 :)

INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Server2/checkpoints
INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.0
INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.1
INFO libvirthelper - redefineCheckpoints: Redefine missing checkpoint virtnbdbackup.10
ERROR libvirthelper - redefineCheckpoints: Unable to redefine checkpoint virtnbdbackup.10: invalid argument: parent

what are the contents of:

/mnt/user/Backup/vm-backups/Server2/checkpoints

in this situation? Guess the contents of the directory are not sorted correctly within the python code.

abbbi commented 3 years ago

Can reproduce with:

abi@cefix:~$ mkdir checkpoints
abi@cefix:~$ for I in $(seq 1 12); do touch checkpoints/virtnbdbackup.$I.xml; done
>>> import glob
>>> l = glob.glob('checkpoints/*.xml')
>>> sorted(l)
['checkpoints/virtnbdbackup.1.xml', 'checkpoints/virtnbdbackup.10.xml', 'checkpoints/virtnbdbackup.11.xml', 'checkpoints/virtnbdbackup.12.xml', 'checkpoints/virtnbdbackup.2.xml', 'checkpoints/virtnbdbackup.3.xml', 'checkpoints/virtnbdbackup.4.xml', 'checkpoints/virtnbdbackup.5.xml', 'checkpoints/virtnbdbackup.6.xml', 'checkpoints/virtnbdbackup.7.xml', 'checkpoints/virtnbdbackup.8.xml', 'checkpoints/virtnbdbackup.9.xml']
abbbi commented 3 years ago

Can you try with master checkout? It now attempts to sort the checkpoint files by file modification time, thus this problem should not happen now anymore.

adrianparilli commented 3 years ago

Hello, thanks for your quick response. In fact looks as a sorting problem. I will rebuild (runs from inside docker, server gives no choice to install all dependencies) and will be testing as soon as I can.

adrianparilli commented 3 years ago

Hello, I've been unable to reproduce the scenario to verify if sorting issue has gone, but I found another issue that for sure it's related.

I re-built the backup chain for 2 VMs which had this issue before (using the latest commit in master branch):

Full backup: [2021-09-24 00:11:46] INFO common - printVersion: Version: 0.24 Arguments: /usr/local/bin/virtnbdbackup -d Anne -l full -o /mnt/user/Backup/vm-backups/Anne --compress [2021-09-24 00:11:46] INFO virtnbdbackup - main: Store checkpoints in: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:11:46] WARNING libvirthelper - getDomainDisks: Raw disk hdb excluded by default, use option --raw to include. [2021-09-24 00:11:46] INFO virtnbdbackup - main: Domain has 1 disks attached which support changed block tracking. [2021-09-24 00:11:46] INFO virtnbdbackup - main: Attempting to redefine checkpoints from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:11:46] INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:11:46] INFO virtnbdbackup - main: Looking for checkpoints [2021-09-24 00:11:47] INFO virtnbdbackup - main: Using checkpoint name: virtnbdbackup.0 [2021-09-24 00:11:47] INFO virtnbdbackup - main: Temporary scratch file target directory: /var/tmp [2021-09-24 00:11:47] INFO virtnbdbackup - main: Starting backup job. [2021-09-24 00:11:48] INFO libvirthelper - fsFreeze: Freeze filesystems. [2021-09-24 00:11:49] INFO libvirthelper - fsThaw: Thawed filesystems. [2021-09-24 00:11:49] INFO virtnbdbackup - main: Started backup job with checkpoint, saving information. [2021-09-24 00:11:49] INFO libvirthelper - backupCheckpoint: Saving checkpoint config to /mnt/user/Backup/vm-backups/Anne/checkpoints/virtnbdbackup.0.xml [2021-09-24 00:11:49] INFO nbdhelper - version: 1.6.1 [2021-09-24 00:11:49] INFO virtnbdbackup - backupDisk: NDB Endpoint socket: /var/tmp/virtnbdbackup.1 [2021-09-24 00:11:49] INFO nbdhelper - getBlockInfo: Using Maximum Block size supported by nbd server: 33554432 [2021-09-24 00:11:49] INFO virtnbdbackup - backupDisk: Using nbd to query extents [2021-09-24 00:11:50] INFO virtnbdbackup - backupDisk: Got 1037 extents [2021-09-24 00:11:50] INFO virtnbdbackup - backupDisk: 214748364800 bytes disk size [2021-09-24 00:11:50] INFO virtnbdbackup - backupDisk: 63589384192 bytes of data extents to backup [2021-09-24 00:11:50] INFO virtnbdbackup - backupDisk: Write data to target file: /mnt/user/Backup/vm-backups/Anne/hdc.full.data.partial [2021-09-24 00:11:50] INFO virtnbdbackup - backupDisk: Creating thin provisioned stream backup image [2021-09-24 00:11:50] INFO virtnbdbackup - backupDisk: Compression enabled [2021-09-24 00:14:37] INFO virtnbdbackup - backupDisk: Backup of disk hdc finished, file: /mnt/user/Backup/vm-backups/Anne/hdc.full.data [2021-09-24 00:14:37] INFO virtnbdbackup - backupConfig: Saving VM config to: /mnt/user/Backup/vm-backups/Anne/vmconfig.virtnbdbackup.0.xml [2021-09-24 00:14:37] INFO virtnbdbackup - main: Stopping backup task [2021-09-24 00:14:37] INFO virtnbdbackup - main: Finished

1st inc backup: [2021-09-24 00:41:36] INFO common - printVersion: Version: 0.24 Arguments: /usr/local/bin/virtnbdbackup -d Anne -l inc -o /mnt/user/Backup/vm-backups/Anne --compress [2021-09-24 00:41:36] INFO virtnbdbackup - main: Store checkpoints in: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:41:36] WARNING libvirthelper - getDomainDisks: Raw disk hdb excluded by default, use option --raw to include. [2021-09-24 00:41:36] INFO virtnbdbackup - main: Domain has 1 disks attached which support changed block tracking. [2021-09-24 00:41:36] INFO virtnbdbackup - main: Attempting to redefine checkpoints from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:41:36] INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:41:36] INFO virtnbdbackup - main: Looking for checkpoints [2021-09-24 00:41:36] INFO virtnbdbackup - main: Found checkpoints, next checkpoint id: 1 [2021-09-24 00:41:36] INFO virtnbdbackup - main: Parent checkpoint name virtnbdbackup.0 [2021-09-24 00:41:36] INFO virtnbdbackup - main: Using checkpoint name: virtnbdbackup.1 [2021-09-24 00:41:36] INFO virtnbdbackup - main: Temporary scratch file target directory: /var/tmp [2021-09-24 00:41:36] INFO virtnbdbackup - main: Starting backup job. [2021-09-24 00:41:37] INFO libvirthelper - fsFreeze: Freeze filesystems. [2021-09-24 00:43:07] INFO libvirthelper - fsThaw: Thawed filesystems. [2021-09-24 00:43:07] INFO virtnbdbackup - main: Started backup job with checkpoint, saving information. [2021-09-24 00:43:07] INFO libvirthelper - backupCheckpoint: Saving checkpoint config to /mnt/user/Backup/vm-backups/Anne/checkpoints/virtnbdbackup.1.xml [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: INC backup: set context to qemu:dirty-bitmap:backup-hdc [2021-09-24 00:43:07] INFO nbdhelper - version: 1.6.1 [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: NDB Endpoint socket: /var/tmp/virtnbdbackup.1 [2021-09-24 00:43:07] INFO nbdhelper - getBlockInfo: Using Maximum Block size supported by nbd server: 33554432 [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: Using nbd to query extents [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: Got 17893 extents [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: 214748364800 bytes disk size [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: 3036479488 bytes of data extents to backup [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: Write data to target file: /mnt/user/Backup/vm-backups/Anne/hdc.inc.virtnbdbackup.1.data.partial [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: Creating thin provisioned stream backup image [2021-09-24 00:43:07] INFO virtnbdbackup - backupDisk: Compression enabled [2021-09-24 00:43:14] INFO virtnbdbackup - backupDisk: Backup of disk hdc finished, file: /mnt/user/Backup/vm-backups/Anne/hdc.inc.virtnbdbackup.1.data [2021-09-24 00:43:14] INFO virtnbdbackup - backupConfig: Saving VM config to: /mnt/user/Backup/vm-backups/Anne/vmconfig.virtnbdbackup.1.xml [2021-09-24 00:43:14] INFO virtnbdbackup - main: Stopping backup task [2021-09-24 00:43:14] INFO virtnbdbackup - main: Finished

2nd inc backup: [2021-09-24 00:43:25] INFO common - printVersion: Version: 0.24 Arguments: /usr/local/bin/virtnbdbackup -d Anne -l inc -o /mnt/user/Backup/vm-backups/Anne --compress [2021-09-24 00:43:25] INFO virtnbdbackup - main: Store checkpoints in: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:43:25] WARNING libvirthelper - getDomainDisks: Raw disk hdb excluded by default, use option --raw to include. [2021-09-24 00:43:25] INFO virtnbdbackup - main: Domain has 1 disks attached which support changed block tracking. [2021-09-24 00:43:25] INFO virtnbdbackup - main: Attempting to redefine checkpoints from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:43:25] INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 00:43:25] INFO virtnbdbackup - main: Looking for checkpoints [2021-09-24 00:43:25] INFO virtnbdbackup - main: Found checkpoints, next checkpoint id: 2 [2021-09-24 00:43:25] INFO virtnbdbackup - main: Parent checkpoint name virtnbdbackup.1 [2021-09-24 00:43:25] INFO virtnbdbackup - main: Using checkpoint name: virtnbdbackup.2 [2021-09-24 00:43:25] INFO virtnbdbackup - main: Temporary scratch file target directory: /var/tmp [2021-09-24 00:43:25] INFO virtnbdbackup - main: Starting backup job. [2021-09-24 00:43:26] INFO libvirthelper - fsFreeze: Freeze filesystems. [2021-09-24 00:43:27] INFO libvirthelper - fsThaw: Thawed filesystems. [2021-09-24 00:43:27] ERROR virtnbdbackup - main: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: virtnbdbackup.2 Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/virtnbdbackup-0.24-py3.9.egg/EGG-INFO/scripts/virtnbdbackup", line 315, in main File "/usr/local/lib/python3.9/dist-packages/virtnbdbackup-0.24-py3.9.egg/libvirtnbdbackup/libvirthelper/libvirthelper.py", line 262, in startBackup domObj.backupBegin(backupXml, checkpointXml) File "/usr/lib/python3/dist-packages/libvirt.py", line 797, in backupBegin raise libvirtError('virDomainBackupBegin() failed') libvirt.libvirtError: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: virtnbdbackup.2

When I check libvirt's checkpoints in server, everything is fine: ls -lhsa /var/lib/libvirt/qemu/checkpoint/Anne total 16K 0 drwxrwxrwx 2 root root 80 Sep 24 02:41 ./ 0 drwxrwxrwx 6 root users 120 Sep 21 06:40 ../ 8.0K -rw------- 1 root root 5.6K Sep 24 02:11 virtnbdbackup.0.xml 8.0K -rw------- 1 root root 5.6K Sep 24 02:41 virtnbdbackup.1.xml

But, it's failing because virtnbdbackup.2 already exists, so I checked the disk image and found this:

qemu-img info vdisk1.img image: vdisk1.img file format: qcow2 virtual size: 200 GiB (214748364800 bytes) disk size: 59.5 GiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false bitmaps: 0: flags:

        name: virtnbdbackup.2
        granularity: 65536
    [1]:
        flags:
            [0]: auto
        name: virtnbdbackup.3
        granularity: 65536
    [2]:
        flags:
            [0]: auto
        name: virtnbdbackup.4
        granularity: 65536
    [3]:
        flags:
            [0]: auto
        name: virtnbdbackup.5
        granularity: 65536
    [4]:
        flags:
            [0]: auto
        name: virtnbdbackup.6
        granularity: 65536
    [5]:
        flags:
            [0]: auto
        name: virtnbdbackup.7
        granularity: 65536
    [6]:
        flags:
            [0]: auto
        name: virtnbdbackup.8
        granularity: 65536
    [7]:
        flags:
            [0]: auto
        name: virtnbdbackup.9
        granularity: 65536
    [8]:
        flags:
            [0]: auto
        name: virtnbdbackup.10
        granularity: 65536
    [9]:
        flags:
            [0]: auto
        name: virtnbdbackup.11
        granularity: 65536
    [10]:
        flags:
            [0]: auto
        name: virtnbdbackup.12
        granularity: 65536
    [11]:
        flags:
            [0]: auto
        name: virtnbdbackup.13
        granularity: 65536
    [12]:
        flags:
            [0]: auto
        name: virtnbdbackup.14
        granularity: 65536
    [13]:
        flags:
            [0]: auto
        name: virtnbdbackup.15
        granularity: 65536
    [14]:
        flags:
            [0]: auto
        name: virtnbdbackup.16
        granularity: 65536
    [15]:
        flags:
            [0]: auto
        name: virtnbdbackup.17
        granularity: 65536
    [16]:
        flags:
            [0]: auto
        name: virtnbdbackup.18
        granularity: 65536
    [17]:
        flags:
            [0]: auto
        name: virtnbdbackup.1
        granularity: 65536
    [18]:
        flags:
            [0]: auto
        name: virtnbdbackup.0
        granularity: 65536
refcount bits: 16
corrupt: false

To get this list, I had to shutdown the VM because the newly created bitmaps aren't visible while VM is running (Note the inverse order of [18] and [19])

If you look at the order, the checkpoints that should not be there are those which wasn't recovered because the sorting issue (starting from 'virtnbdbackup.2')

Deleting by hand each old bitmap with 'qemu-img bitmap --remove Anne virtnbdbackup.x' worked for me:

[2021-09-24 01:10:23] INFO common - printVersion: Version: 0.24 Arguments: /usr/local/bin/virtnbdbackup -d Anne -l inc -o /mnt/user/Backup/vm-backups/Anne --compress [2021-09-24 01:10:23] INFO virtnbdbackup - main: Store checkpoints in: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 01:10:23] WARNING libvirthelper - getDomainDisks: Raw disk hdb excluded by default, use option --raw to include. [2021-09-24 01:10:23] INFO virtnbdbackup - main: Domain has 1 disks attached which support changed block tracking. [2021-09-24 01:10:23] INFO virtnbdbackup - main: Attempting to redefine checkpoints from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 01:10:23] INFO libvirthelper - redefineCheckpoints: Loading checkpoint list from: /mnt/user/Backup/vm-backups/Anne/checkpoints [2021-09-24 01:10:23] INFO virtnbdbackup - main: Looking for checkpoints [2021-09-24 01:10:23] INFO virtnbdbackup - main: Found checkpoints, next checkpoint id: 2 [2021-09-24 01:10:23] INFO virtnbdbackup - main: Parent checkpoint name virtnbdbackup.1 [2021-09-24 01:10:23] INFO virtnbdbackup - main: Using checkpoint name: virtnbdbackup.2 [2021-09-24 01:10:23] INFO virtnbdbackup - main: Temporary scratch file target directory: /var/tmp [2021-09-24 01:10:23] INFO virtnbdbackup - main: Starting backup job. [2021-09-24 01:10:24] INFO libvirthelper - fsFreeze: Freeze filesystems. [2021-09-24 01:10:25] INFO libvirthelper - fsThaw: Thawed filesystems. [2021-09-24 01:10:25] INFO virtnbdbackup - main: Started backup job with checkpoint, saving information. [2021-09-24 01:10:25] INFO libvirthelper - backupCheckpoint: Saving checkpoint config to /mnt/user/Backup/vm-backups/Anne/checkpoints/virtnbdbackup.2.xml [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: INC backup: set context to qemu:dirty-bitmap:backup-hdc [2021-09-24 01:10:25] INFO nbdhelper - version: 1.6.1 [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: NDB Endpoint socket: /var/tmp/virtnbdbackup.1 [2021-09-24 01:10:25] INFO nbdhelper - getBlockInfo: Using Maximum Block size supported by nbd server: 33554432 [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: Using nbd to query extents [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: Got 1877 extents [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: 214748364800 bytes disk size [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: 177733632 bytes of data extents to backup [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: Write data to target file: /mnt/user/Backup/vm-backups/Anne/hdc.inc.virtnbdbackup.2.data.partial [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: Creating thin provisioned stream backup image [2021-09-24 01:10:25] INFO virtnbdbackup - backupDisk: Compression enabled [2021-09-24 01:10:26] INFO virtnbdbackup - backupDisk: Backup of disk hdc finished, file: /mnt/user/Backup/vm-backups/Anne/hdc.inc.virtnbdbackup.2.data [2021-09-24 01:10:26] INFO virtnbdbackup - backupConfig: Saving VM config to: /mnt/user/Backup/vm-backups/Anne/vmconfig.virtnbdbackup.2.xml [2021-09-24 01:10:26] INFO virtnbdbackup - main: Stopping backup task [2021-09-24 01:10:26] INFO virtnbdbackup - main: Finished

After shutdown the VM, the bitmaps are shown like this:

qemu-img info vdisk1.img image: vdisk1.img file format: qcow2 virtual size: 200 GiB (214748364800 bytes) disk size: 59.5 GiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false bitmaps: 0: flags:

        name: virtnbdbackup.1
        granularity: 65536
    [1]:
        flags:
            [0]: auto
        name: virtnbdbackup.0
        granularity: 65536
    [2]:
        flags:
            [0]: auto
        name: virtnbdbackup.2
        granularity: 65536
refcount bits: 16
corrupt: false

Which seems almost fine, but I'm not sure if the order above may be an issue during a sudden restoration, or if I have to rebuild the backup chain by any reason (including disk space).

With another VM, I got the expected order 0 > virtnbdbackup.0, [1] > virtnbdbackup.1 ... and so on. This is on the same server and I can't explain why the above one did in a different order.

(As these are important VMs, I will rebuild the disk images to get rid of bitmaps info and start over again)

In any case, when a full backup is performed it should also scan for old virtnbdbackup related bitmaps on disk image (after, and independently of the current scan) and delete them. Not necessarily for this specific case, but in prevention of any situation which could left bitmaps in a different order than expected.

I will look forward for the original issue to see if it's solved, and report asap to close this thread.

And thank you for this tool. Please don't stop improving it!

Kind Regards, Adrian.

abbbi commented 3 years ago

hi,

the problem you describe now is different from the original issue. I dont think its an issue related to sorting.

It attempts to re-create a checkpoint that isnt listed via libvirt api but is still existant in the qcow image. And here is the point: removing the checkpoint in the qcow2 image would loose track of changed data!

Unfortunately in certain szenarios all kind of different inconsistencies between libvirt checkpoint information and image checkpoint information can happen, its hard to get them all right and i think there might be some real bad corner cases where its not possible for the backup utility to sort everything. Im not a fan of the persistent checkpoint featue in general because i really dont use virtnbdbackup in any cluster environments where it is required.

While it is possible to remove the checkpoints in the qemu image using "qemu-img" it doesnt make sense in this case because then your backup chain is broken ..

adrianparilli commented 3 years ago

Hello,

Regarding checkpoint persistence, I have confirmed that the 'order' they appear on disk image it's caused by QEMU itself, since updates on checkpoints are updated when the image file is closed (https://qemu-project.gitlab.io/qemu/interop/bitmaps.html#bitmap-persistence) The only don't mention that all checkpoints are updated in the inverse order they were created while disk image is open, and this leaded me into a confusion.

As I'm still unable to reproduce the exact conditions to test the latest changes, I'm closing this issue.

Regarding your advice, point taken. Several approaches were analyzed, but the solution that fits better involved project, is this one, since not all VMs are running 24/7.

Summarizing (for the record): A server crash while a VM is running, breaks the backup chain in most of the cases, since QEMU has no time to update bitmaps info onto the disk image. Cases like which originated this issue, are indeed exceptional.

Thank you for everything!