abbbi / virtnbdbackup

Backup utility for Libvirt / qemu / kvm supporting incremental and differential backups + instant recovery (agentless).
http://libvirtbackup.grinser.de/
GNU General Public License v3.0
330 stars 46 forks source link

Remote incremental backup fails: checkpoint virtnbdbackup.1 already exists #89

Closed mortenmathiasen closed 1 year ago

mortenmathiasen commented 1 year ago

I run into an error on running third time running the same auto level backup:

  1. FIRST time: auto level backup => Full backup (just fine)
  2. SECOND time: auto level backup => Incremental backup (just fine)
  3. THIRD time: auto level backup => Incremental backup (ERROR due to reused checkpoint id)

Below you see the output from running the backup command 3 times. Is the error on the third run a software bug in virtnbdbackup or am I doing something wrong here?

[root@host2 cron]# ./backup-virtualmachines

1. FIRST BACKUP RUN

[2022-12-27 06:37:33] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 06:37:33] INFO virtnbdbackup - main [MainThread]: Backup level: [auto]
[2022-12-27 06:37:33] INFO virtnbdbackup - main [MainThread]: Backup mode auto, target folder is empty: executing full backup.
[2022-12-27 06:37:34] INFO client - _connect [MainThread]: Connected to remote host: [host1.infomotor.dk], local host: [host2]
[2022-12-27 06:37:34] INFO virtnbdbackup - main [MainThread]: Libvirt library version: [8000000]
[2022-12-27 06:37:34] INFO client - getDomainDisks [MainThread]: Skipping attached [cdrom] device: [sda].
[2022-12-27 06:37:34] INFO virtnbdbackup - main [MainThread]: Domain has [1] disks attached which support changed block tracking.
[2022-12-27 06:37:34] INFO virtnbdbackup - main [MainThread]: Concurrent backup processes: [1]
[2022-12-27 06:37:34] INFO client - redefineCheckpoints [MainThread]: Loading checkpoint list from: [/data/backup/virtualmachines/almalinux//checkpoints]
[2022-12-27 06:37:34] INFO virtnbdbackup - handleCheckpoints [MainThread]: Checkpoint handling.
[2022-12-27 06:37:34] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.0].
[2022-12-27 06:37:34] INFO client - connect [MainThread]: Connecting remote system via ssh, username: [root]
/usr/lib/python3.6/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.ecdsa_curve.curve_class(), pointinfo
[2022-12-27 06:37:34] INFO transport - _log [Thread-1]: Connected (version 2.0, client OpenSSH_8.0)
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
[2022-12-27 06:37:34] INFO transport - _log [Thread-1]: Authentication (publickey) successful!
[2022-12-27 06:37:34] INFO virtnbdbackup - main [MainThread]: Remote: NDB Endpoint socket: [host1.infomotor.dk:/var/tmp/virtnbdbackup.2775504]
[2022-12-27 06:37:34] INFO virtnbdbackup - main [MainThread]: Temporary scratch file target directory: [/var/tmp]
[2022-12-27 06:37:34] INFO virtnbdbackup - startBackupJob [MainThread]: Starting backup job.
[2022-12-27 06:37:34] WARNING client - fsFreeze [MainThread]: Guest agent is not responding: QEMU guest agent is not connected
[2022-12-27 06:37:34] INFO virtnbdbackup - main [MainThread]: Started backup job with checkpoint, saving information.
[2022-12-27 06:37:35] INFO sftp - _log [MainThread]: [chan 0] Opened sftp connection (server version 3)
[2022-12-27 06:37:35] INFO client - backupCheckpoint [MainThread]: Saving checkpoint config to: [/data/backup/virtualmachines/almalinux//checkpoints/virtnbdbackup.0.xml]
[2022-12-27 06:37:35] INFO client - _printVersion [vda]: libnbd version: 1.6.0
[2022-12-27 06:37:35] INFO client - connect [vda]: Waiting until NBD server at [nbd://host1.infomotor.dk:10809/vda] is up.
[2022-12-27 06:37:36] INFO client - _getBlockInfo [vda]: Using Maximum Block size supported by NBD server: [33554432]
[2022-12-27 06:37:36] INFO client - connect [vda]: Connection to NBD backend succeeded.
[2022-12-27 06:37:36] INFO virtnbdbackup - backupDisk [vda]: Got 228 extents to backup.
[2022-12-27 06:37:36] INFO virtnbdbackup - backupDisk [vda]: 10737418240 bytes disk size
[2022-12-27 06:37:36] INFO virtnbdbackup - backupDisk [vda]: 10731061248 bytes of data extents to backup
[2022-12-27 06:37:36] INFO virtnbdbackup - getWriter [vda]: Write data to target file: [/data/backup/virtualmachines/almalinux//vda.full.data.partial].
[2022-12-27 06:37:36] INFO virtnbdbackup - backupDisk [vda]: Creating thin provisioned stream backup image
[2022-12-27 06:38:34] INFO virtnbdbackup - main [MainThread]: Backup jobs finished, stopping backup task.
[2022-12-27 06:38:34] INFO virtnbdbackup - backupConfig [MainThread]: Saving VM config to: [/data/backup/virtualmachines/almalinux//vmconfig.virtnbdbackup.0.xml]
[2022-12-27 06:38:34] INFO virtnbdbackup - backupDiskInfo [MainThread]: Saved QCOW image config to: [/data/backup/virtualmachines/almalinux//vda.virtnbdbackup.0.qcow.json]
[2022-12-27 06:38:34] INFO sftp - _log [MainThread]: [chan 2] Opened sftp connection (server version 3)
[2022-12-27 06:38:34] INFO sftp - _log [MainThread]: [chan 3] Opened sftp connection (server version 3)
[2022-12-27 06:38:34] INFO sftp - _log [MainThread]: [chan 3] sftp session closed.
[2022-12-27 06:38:34] INFO virtnbdbackup - main [MainThread]: Finished successfully

2. SECOND BACKUP RUN

[2022-12-27 06:38:34] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 06:38:34] INFO virtnbdbackup - main [MainThread]: Backup level: [auto]
[2022-12-27 06:38:34] INFO virtnbdbackup - main [MainThread]: Backup mode auto: executing incremental backup.
[2022-12-27 06:38:35] INFO client - _connect [MainThread]: Connected to remote host: [host1.infomotor.dk], local host: [host2]
[2022-12-27 06:38:35] INFO virtnbdbackup - main [MainThread]: Libvirt library version: [8000000]
[2022-12-27 06:38:35] INFO client - getDomainDisks [MainThread]: Skipping attached [cdrom] device: [sda].
[2022-12-27 06:38:35] INFO virtnbdbackup - main [MainThread]: Domain has [1] disks attached which support changed block tracking.
[2022-12-27 06:38:35] INFO virtnbdbackup - main [MainThread]: Concurrent backup processes: [1]
[2022-12-27 06:38:35] INFO client - redefineCheckpoints [MainThread]: Loading checkpoint list from: [/data/backup/virtualmachines/almalinux//checkpoints]
[2022-12-27 06:38:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Checkpoint handling.
[2022-12-27 06:38:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Next checkpoint id: [1].
[2022-12-27 06:38:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Parent checkpoint name [virtnbdbackup.0].
[2022-12-27 06:38:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.1].
[2022-12-27 06:38:35] INFO client - connect [MainThread]: Connecting remote system via ssh, username: [root]
/usr/lib/python3.6/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.ecdsa_curve.curve_class(), pointinfo
[2022-12-27 06:38:35] INFO transport - _log [Thread-1]: Connected (version 2.0, client OpenSSH_8.0)
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
[2022-12-27 06:38:35] INFO transport - _log [Thread-1]: Authentication (publickey) successful!
[2022-12-27 06:38:35] INFO virtnbdbackup - main [MainThread]: Remote: NDB Endpoint socket: [host1.infomotor.dk:/var/tmp/virtnbdbackup.2775518]
[2022-12-27 06:38:35] INFO virtnbdbackup - main [MainThread]: Temporary scratch file target directory: [/var/tmp]
[2022-12-27 06:38:35] INFO virtnbdbackup - startBackupJob [MainThread]: Starting backup job.
[2022-12-27 06:38:35] WARNING client - fsFreeze [MainThread]: Guest agent is not responding: QEMU guest agent is not connected
[2022-12-27 06:38:35] INFO virtnbdbackup - main [MainThread]: Started backup job with checkpoint, saving information.
[2022-12-27 06:38:35] INFO sftp - _log [MainThread]: [chan 0] Opened sftp connection (server version 3)
[2022-12-27 06:38:35] INFO client - backupCheckpoint [MainThread]: Saving checkpoint config to: [/data/backup/virtualmachines/almalinux//checkpoints/virtnbdbackup.1.xml]
[2022-12-27 06:38:35] INFO virtnbdbackup - setMetaContext [vda]: INC/DIFF backup: set context to [qemu:dirty-bitmap:backup-vda]
[2022-12-27 06:38:35] INFO client - _printVersion [vda]: libnbd version: 1.6.0
[2022-12-27 06:38:35] INFO client - connect [vda]: Waiting until NBD server at [nbd://host1.infomotor.dk:10809/vda] is up.
[2022-12-27 06:38:36] INFO client - _getBlockInfo [vda]: Using Maximum Block size supported by NBD server: [33554432]
[2022-12-27 06:38:36] INFO client - connect [vda]: Connection to NBD backend succeeded.
[2022-12-27 06:38:36] INFO virtnbdbackup - backupDisk [vda]: Got 5 extents to backup.
[2022-12-27 06:38:36] INFO virtnbdbackup - backupDisk [vda]: 10737418240 bytes disk size
[2022-12-27 06:38:36] INFO virtnbdbackup - backupDisk [vda]: 131072 bytes of data extents to backup
[2022-12-27 06:38:36] INFO virtnbdbackup - getWriter [vda]: Write data to target file: [/data/backup/virtualmachines/almalinux//vda.inc.virtnbdbackup.1.data.partial].
[2022-12-27 06:38:36] INFO virtnbdbackup - backupDisk [vda]: Creating thin provisioned stream backup image
[2022-12-27 06:38:36] INFO virtnbdbackup - main [MainThread]: Backup jobs finished, stopping backup task.
[2022-12-27 06:38:36] INFO virtnbdbackup - backupConfig [MainThread]: Saving VM config to: [/data/backup/virtualmachines/almalinux//vmconfig.virtnbdbackup.1.xml]
[2022-12-27 06:38:36] INFO virtnbdbackup - backupDiskInfo [MainThread]: Saved QCOW image config to: [/data/backup/virtualmachines/almalinux//vda.virtnbdbackup.1.qcow.json]
[2022-12-27 06:38:36] INFO sftp - _log [MainThread]: [chan 2] Opened sftp connection (server version 3)
[2022-12-27 06:38:37] INFO sftp - _log [MainThread]: [chan 3] Opened sftp connection (server version 3)
[2022-12-27 06:38:37] INFO sftp - _log [MainThread]: [chan 3] sftp session closed.
[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Finished successfully

3. THIRD BACKUP RUN

[2022-12-27 06:38:37] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Backup level: [auto]
[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Backup mode auto: executing incremental backup.
[2022-12-27 06:38:37] INFO client - _connect [MainThread]: Connected to remote host: [host1.infomotor.dk], local host: [host2]
[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Libvirt library version: [8000000]
[2022-12-27 06:38:37] INFO client - getDomainDisks [MainThread]: Skipping attached [cdrom] device: [sda].
[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Domain has [1] disks attached which support changed block tracking.
[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Concurrent backup processes: [1]
[2022-12-27 06:38:37] INFO client - redefineCheckpoints [MainThread]: Loading checkpoint list from: [/data/backup/virtualmachines/almalinux//checkpoints]
[2022-12-27 06:38:37] INFO virtnbdbackup - handleCheckpoints [MainThread]: Checkpoint handling.
[2022-12-27 06:38:37] INFO virtnbdbackup - handleCheckpoints [MainThread]: Next checkpoint id: [1].
[2022-12-27 06:38:37] INFO virtnbdbackup - handleCheckpoints [MainThread]: Parent checkpoint name [virtnbdbackup.1].
[2022-12-27 06:38:37] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.1].
[2022-12-27 06:38:37] INFO client - connect [MainThread]: Connecting remote system via ssh, username: [root]
/usr/lib/python3.6/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.ecdsa_curve.curve_class(), pointinfo
[2022-12-27 06:38:38] INFO transport - _log [Thread-1]: Connected (version 2.0, client OpenSSH_8.0)
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
[2022-12-27 06:38:38] INFO transport - _log [Thread-1]: Authentication (publickey) successful!
[2022-12-27 06:38:38] INFO virtnbdbackup - main [MainThread]: Remote: NDB Endpoint socket: [host1.infomotor.dk:/var/tmp/virtnbdbackup.2775524]
[2022-12-27 06:38:38] INFO virtnbdbackup - main [MainThread]: Temporary scratch file target directory: [/var/tmp]
[2022-12-27 06:38:38] INFO virtnbdbackup - startBackupJob [MainThread]: Starting backup job.
[2022-12-27 06:38:38] WARNING client - fsFreeze [MainThread]: Guest agent is not responding: QEMU guest agent is not connected
[2022-12-27 06:38:38] ERROR virtnbdbackup - startBackupJob [MainThread]: Failed to start backup: [operation failed: domain moment virtnbdbackup.1 already exists]

[root@host2 cron]# ls -l /data/backup/virtualmachines/almalinux//checkpoints
total 16
-rw-r--r--. 1 root root 6559 Dec 27 06:37 virtnbdbackup.0.xml
-rw-r--r--. 1 root root 6615 Dec 27 06:38 virtnbdbackup.1.xml
abbbi commented 1 year ago

that appears strange. Were the contents of /data/backup/virtualmachines/almalinux/ altered between first and third run? What are the contents of /data/backup/virtualmachines/almalinux/checkpoints and /data/backup/virtualmachines/almalinux/almalinux.cpt after second run?

the problem is here: even tho it detects virtnbdbackup.1 to be already existant, it doesent raise the number. Havent seen this happening before.

[2022-12-27 06:38:37] INFO virtnbdbackup - handleCheckpoints [MainThread]: Parent checkpoint name [virtnbdbackup.1].
[2022-12-27 06:38:37] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.1].

also, i cant reproduce this locally in my environment for some reason...

It appears to me the second and third backup iteration are executed mereley seconds between each other, so this might be a race condition, too (checkpoint file not closed and flushed to disk already)

mortenmathiasen commented 1 year ago

I made a pull request which fix'es the problem - does that make sense?

mortenmathiasen commented 1 year ago

It is not a race condition. I run into the same problem later on too.

abbbi commented 1 year ago

the second backup run ends at 06:38:37

[2022-12-27 06:38:37] INFO virtnbdbackup - main [MainThread]: Finished successfully

and the third backup run starts at exactly the same moment:

[2022-12-27 06:38:37] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto

so i suspect some race condition between the finishing process and the starting backup process. on libvirt side everything is settled, but it may seem the checkpoint files and the checkpoint .cpt file may not already be flushed to disk, so the third run refers to the same checkpoint.

Are you execetuting these in seperate terminals??

abbbi commented 1 year ago

I made a pull request which fix'es the problem - does that make sense?

not really, because the checkpoint information of the second backup run is used to calculate the next checkpoint name. At this point, it seems the utility is simply not knowing about the past backup task. I try to reproduce on my system but i cant...

mortenmathiasen commented 1 year ago

I ran the 3 backups from scricpt originally. Now I ran them manually by hand, which solves the problem! You were right, it must be a race condition.

abbbi commented 1 year ago

could think about creating a lock file while a backup run is active to prevent such situations, but then its quite uncommon to start backups so fast this race might appear.

mortenmathiasen commented 1 year ago

Sorry, the error persists. I thought it was fixed by running the backups manually, but the fix relied upon using my patch from the pull request. Below wihtout any patch you see the error again on manually execution and the stored checkpoints after each run.

Any suggests on fixing this?

root@host2 /]# virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 09:56:05] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux\
 -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 09:56:05] INFO virtnbdbackup - main [MainThread]: Backup level: [auto]
[2022-12-27 09:56:05] INFO virtnbdbackup - main [MainThread]: Backup mode auto, target folder is empty: executing full backup.
[2022-12-27 09:56:11] INFO client - _connect [MainThread]: Connected to remote host: [host1.infomotor.dk], local host: [host2]
[2022-12-27 09:56:11] INFO virtnbdbackup - main [MainThread]: Libvirt library version: [8000000]
[2022-12-27 09:56:11] INFO client - getDomainDisks [MainThread]: Skipping attached [cdrom] device: [sda].
[2022-12-27 09:56:11] INFO virtnbdbackup - main [MainThread]: Domain has [1] disks attached which support changed block tracking.
[2022-12-27 09:56:11] INFO virtnbdbackup - main [MainThread]: Concurrent backup processes: [1]
[2022-12-27 09:56:11] INFO client - redefineCheckpoints [MainThread]: Loading checkpoint list from: [/data/backup/virtualmachines/almalinux//checkpoints]
[2022-12-27 09:56:11] INFO virtnbdbackup - handleCheckpoints [MainThread]: Checkpoint handling.
[2022-12-27 09:56:11] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.0].
[2022-12-27 09:56:11] INFO client - connect [MainThread]: Connecting remote system via ssh, username: [root]
/usr/lib/python3.6/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be r\
emoved in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.ecdsa_curve.curve_class(), pointinfo
[2022-12-27 09:56:11] INFO transport - _log [Thread-1]: Connected (version 2.0, client OpenSSH_8.0)
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will b\
e removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will \
be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will \
be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
[2022-12-27 09:56:11] INFO transport - _log [Thread-1]: Authentication (publickey) successful!
[2022-12-27 09:56:11] INFO virtnbdbackup - main [MainThread]: Remote: NDB Endpoint socket: [host1.infomotor.dk:/var/tmp/virtnbdbackup.2777487]
[2022-12-27 09:56:11] INFO virtnbdbackup - main [MainThread]: Temporary scratch file target directory: [/var/tmp]
[2022-12-27 09:56:11] INFO virtnbdbackup - startBackupJob [MainThread]: Starting backup job.
[2022-12-27 09:56:11] WARNING client - fsFreeze [MainThread]: Guest agent is not responding: QEMU guest agent is not connected
[2022-12-27 09:56:11] INFO virtnbdbackup - main [MainThread]: Started backup job with checkpoint, saving information.
[2022-12-27 09:56:11] INFO sftp - _log [MainThread]: [chan 0] Opened sftp connection (server version 3)
[2022-12-27 09:56:11] INFO client - backupCheckpoint [MainThread]: Saving checkpoint config to: [/data/backup/virtualmachines/almalinux//checkpoints/virtnbdbackup.0.xml\
]
[2022-12-27 09:56:11] INFO client - _printVersion [vda]: libnbd version: 1.6.0
[2022-12-27 09:56:11] INFO client - connect [vda]: Waiting until NBD server at [nbd://host1.infomotor.dk:10809/vda] is up.
[2022-12-27 09:56:12] INFO client - _getBlockInfo [vda]: Using Maximum Block size supported by NBD server: [33554432]
[2022-12-27 09:56:12] INFO client - connect [vda]: Connection to NBD backend succeeded.
[2022-12-27 09:56:12] INFO virtnbdbackup - backupDisk [vda]: Got 228 extents to backup.
[2022-12-27 09:56:12] INFO virtnbdbackup - backupDisk [vda]: 10737418240 bytes disk size
[2022-12-27 09:56:12] INFO virtnbdbackup - backupDisk [vda]: 10731061248 bytes of data extents to backup
[2022-12-27 09:56:12] INFO virtnbdbackup - getWriter [vda]: Write data to target file: [/data/backup/virtualmachines/almalinux//vda.full.data.partial].
[2022-12-27 09:56:12] INFO virtnbdbackup - backupDisk [vda]: Creating thin provisioned stream backup image
[2022-12-27 09:57:10] INFO virtnbdbackup - main [MainThread]: Backup jobs finished, stopping backup task.
[2022-12-27 09:57:10] INFO virtnbdbackup - backupConfig [MainThread]: Saving VM config to: [/data/backup/virtualmachines/almalinux//vmconfig.virtnbdbackup.0.xml]
[2022-12-27 09:57:10] INFO virtnbdbackup - backupDiskInfo [MainThread]: Saved QCOW image config to: [/data/backup/virtualmachines/almalinux//vda.virtnbdbackup.0.qcow.js\
on]
[2022-12-27 09:57:10] INFO sftp - _log [MainThread]: [chan 2] Opened sftp connection (server version 3)
[2022-12-27 09:57:10] INFO sftp - _log [MainThread]: [chan 3] Opened sftp connection (server version 3)
[2022-12-27 09:57:10] INFO sftp - _log [MainThread]: [chan 3] sftp session closed.
[2022-12-27 09:57:10] INFO virtnbdbackup - main [MainThread]: Finished successfully
[root@host2 /]# 
[root@host2 /]# ls -l /data/backup/virtualmachines/almalinux//checkpoints
total 8
-rw-r--r--. 1 root root 6559 Dec 27 09:56 virtnbdbackup.0.xml
[root@host2 /]# 
root@host2 /]# virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 09:57:34] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux\
 -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 09:57:34] INFO virtnbdbackup - main [MainThread]: Backup level: [auto]
[2022-12-27 09:57:34] INFO virtnbdbackup - main [MainThread]: Backup mode auto: executing incremental backup.
[2022-12-27 09:57:35] INFO client - _connect [MainThread]: Connected to remote host: [host1.infomotor.dk], local host: [host2]
[2022-12-27 09:57:35] INFO virtnbdbackup - main [MainThread]: Libvirt library version: [8000000]
[2022-12-27 09:57:35] INFO client - getDomainDisks [MainThread]: Skipping attached [cdrom] device: [sda].
[2022-12-27 09:57:35] INFO virtnbdbackup - main [MainThread]: Domain has [1] disks attached which support changed block tracking.
[2022-12-27 09:57:35] INFO virtnbdbackup - main [MainThread]: Concurrent backup processes: [1]
[2022-12-27 09:57:35] INFO client - redefineCheckpoints [MainThread]: Loading checkpoint list from: [/data/backup/virtualmachines/almalinux//checkpoints]
[2022-12-27 09:57:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Checkpoint handling.
[2022-12-27 09:57:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Next checkpoint id: [1].
[2022-12-27 09:57:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Parent checkpoint name [virtnbdbackup.0].
[2022-12-27 09:57:35] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.1].
[2022-12-27 09:57:35] INFO client - connect [MainThread]: Connecting remote system via ssh, username: [root]
/usr/lib/python3.6/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be r\
emoved in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.ecdsa_curve.curve_class(), pointinfo
[2022-12-27 09:57:35] INFO transport - _log [Thread-1]: Connected (version 2.0, client OpenSSH_8.0)
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will b\
e removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will \
be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will \
be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
[2022-12-27 09:57:35] INFO transport - _log [Thread-1]: Authentication (publickey) successful!
[2022-12-27 09:57:35] INFO virtnbdbackup - main [MainThread]: Remote: NDB Endpoint socket: [host1.infomotor.dk:/var/tmp/virtnbdbackup.2777514]
[2022-12-27 09:57:35] INFO virtnbdbackup - main [MainThread]: Temporary scratch file target directory: [/var/tmp]
[2022-12-27 09:57:35] INFO virtnbdbackup - startBackupJob [MainThread]: Starting backup job.
[2022-12-27 09:57:35] WARNING client - fsFreeze [MainThread]: Guest agent is not responding: QEMU guest agent is not connected
[2022-12-27 09:57:35] INFO virtnbdbackup - main [MainThread]: Started backup job with checkpoint, saving information.
[2022-12-27 09:57:35] INFO sftp - _log [MainThread]: [chan 0] Opened sftp connection (server version 3)
[2022-12-27 09:57:35] INFO client - backupCheckpoint [MainThread]: Saving checkpoint config to: [/data/backup/virtualmachines/almalinux//checkpoints/virtnbdbackup.1.xml\
]
[2022-12-27 09:57:35] INFO virtnbdbackup - setMetaContext [vda]: INC/DIFF backup: set context to [qemu:dirty-bitmap:backup-vda]
[2022-12-27 09:57:35] INFO client - _printVersion [vda]: libnbd version: 1.6.0
[2022-12-27 09:57:35] INFO client - connect [vda]: Waiting until NBD server at [nbd://host1.infomotor.dk:10809/vda] is up.
[2022-12-27 09:57:36] INFO client - _getBlockInfo [vda]: Using Maximum Block size supported by NBD server: [33554432]
[2022-12-27 09:57:36] INFO client - connect [vda]: Connection to NBD backend succeeded.
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDisk [vda]: Got 1 extents to backup.
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDisk [vda]: 10737418240 bytes disk size
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDisk [vda]: 0 bytes of data extents to backup
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDisk [vda]: No dirty blocks found
[2022-12-27 09:57:36] INFO virtnbdbackup - getWriter [vda]: Write data to target file: [/data/backup/virtualmachines/almalinux//vda.inc.virtnbdbackup.1.data.partial].
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDisk [vda]: Creating thin provisioned stream backup image
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDisk [vda]: Backup of disk [vda] finished, file: [/data/backup/virtualmachines/almalinux//vda.inc.virtnbdbackup.1.data]
[2022-12-27 09:57:36] INFO virtnbdbackup - main [MainThread]: Backup jobs finished, stopping backup task.
[2022-12-27 09:57:36] INFO virtnbdbackup - backupConfig [MainThread]: Saving VM config to: [/data/backup/virtualmachines/almalinux//vmconfig.virtnbdbackup.1.xml]
[2022-12-27 09:57:36] INFO virtnbdbackup - backupDiskInfo [MainThread]: Saved QCOW image config to: [/data/backup/virtualmachines/almalinux//vda.virtnbdbackup.1.qcow.js\
on]
[2022-12-27 09:57:37] INFO sftp - _log [MainThread]: [chan 2] Opened sftp connection (server version 3)
[2022-12-27 09:57:37] INFO sftp - _log [MainThread]: [chan 3] Opened sftp connection (server version 3)
[2022-12-27 09:57:37] INFO sftp - _log [MainThread]: [chan 3] sftp session closed.
[2022-12-27 09:57:37] INFO virtnbdbackup - main [MainThread]: Finished successfully
[root@host2 /]# 
[root@host2 /]# !ls
ls -l /data/backup/virtualmachines/almalinux//checkpoints
total 16
-rw-r--r--. 1 root root 6559 Dec 27 09:56 virtnbdbackup.0.xml
-rw-r--r--. 1 root root 6615 Dec 27 09:57 virtnbdbackup.1.xml
root@host2 /]# !virt
virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 09:57:52] INFO common - printVersion [MainThread]: Version: 1.9.9 Arguments: /bin/virtnbdbackup -U qemu+ssh://root@host1/system --ssh-user root -d almalinux\
 -o /data/backup/virtualmachines/almalinux/ -l auto
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Backup level: [auto]
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Backup mode auto: executing incremental backup.
[2022-12-27 09:57:52] INFO client - _connect [MainThread]: Connected to remote host: [host1.infomotor.dk], local host: [host2]
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Libvirt library version: [8000000]
[2022-12-27 09:57:52] INFO client - getDomainDisks [MainThread]: Skipping attached [cdrom] device: [sda].
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Domain has [1] disks attached which support changed block tracking.
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Concurrent backup processes: [1]
[2022-12-27 09:57:52] INFO client - redefineCheckpoints [MainThread]: Loading checkpoint list from: [/data/backup/virtualmachines/almalinux//checkpoints]
[2022-12-27 09:57:52] INFO virtnbdbackup - handleCheckpoints [MainThread]: Checkpoint handling.
[2022-12-27 09:57:52] INFO virtnbdbackup - handleCheckpoints [MainThread]: Next checkpoint id: [1].
[2022-12-27 09:57:52] INFO virtnbdbackup - handleCheckpoints [MainThread]: Parent checkpoint name [virtnbdbackup.1].
[2022-12-27 09:57:52] INFO virtnbdbackup - handleCheckpoints [MainThread]: Using checkpoint name: [virtnbdbackup.1].
[2022-12-27 09:57:52] INFO client - connect [MainThread]: Connecting remote system via ssh, username: [root]
/usr/lib/python3.6/site-packages/paramiko/ecdsakey.py:164: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will be r\
emoved in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.ecdsa_curve.curve_class(), pointinfo
[2022-12-27 09:57:52] INFO transport - _log [Thread-1]: Connected (version 2.0, client OpenSSH_8.0)
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:39: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will b\
e removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  m.add_string(self.Q_C.public_numbers().encode_point())
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:96: CryptographyDeprecationWarning: Support for unsafe construction of public numbers from encoded data will \
be removed in a future version. Please use EllipticCurvePublicKey.from_encoded_point
  self.curve, Q_S_bytes
/usr/lib/python3.6/site-packages/paramiko/kex_ecdh_nist.py:111: CryptographyDeprecationWarning: encode_point has been deprecated on EllipticCurvePublicNumbers and will \
be removed in a future version. Please use EllipticCurvePublicKey.public_bytes to obtain both compressed and uncompressed point encoding.
  hm.add_string(self.Q_C.public_numbers().encode_point())
[2022-12-27 09:57:52] INFO transport - _log [Thread-1]: Authentication (publickey) successful!
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Remote: NDB Endpoint socket: [host1.infomotor.dk:/var/tmp/virtnbdbackup.2777521]
[2022-12-27 09:57:52] INFO virtnbdbackup - main [MainThread]: Temporary scratch file target directory: [/var/tmp]
[2022-12-27 09:57:52] INFO virtnbdbackup - startBackupJob [MainThread]: Starting backup job.
[2022-12-27 09:57:52] WARNING client - fsFreeze [MainThread]: Guest agent is not responding: QEMU guest agent is not connected
[2022-12-27 09:57:53] ERROR virtnbdbackup - startBackupJob [MainThread]: Failed to start backup: [operation failed: domain moment virtnbdbackup.1 already exists]
[root@host2 /]# 
[root@host2 /]# !ls
[root@host2 /]# !ls
ls -l /data/backup/virtualmachines/almalinux//checkpoints
total 16
-rw-r--r--. 1 root root 6559 Dec 27 09:56 virtnbdbackup.0.xml
-rw-r--r--. 1 root root 6615 Dec 27 09:57 virtnbdbackup.1.xml
[root@host2 /]# 
mortenmathiasen commented 1 year ago

A little debugging shows that the list checkpoints only contains 'virtnbdbackup.1' while the list is missing 'virtnbdbackup.0'. This might be the reason why the patch works and the original code doesn't.

abbbi commented 1 year ago

hi,

i still fail to reproduce this, 100 iterations are working just fine on my system and i have never seen such a behavior! Is /data/backup a local filesystem???

for I in $(seq 1 100); do ./virtnbdbackup -d vm1 -o /tmp/NEWBACKUP -l auto 2> /dev/null || echo "fail"; done

have never seen this happening, try to add some debug output in handleCheckpoints It really appears to me the filesystem in /data/ might not flush fast enough between runs.

Really, this is a strange case. The Process is as Follows:

1) virtnbdbackup checks the backup target directory 2) if backup level is auto, it is checked if already a full backup is within this directory 3) if full backup exists, it reads targetdir/virtualmachinename.cpt and prepares a list of checkpoints 4) if already an checkpoint exists, the checkpoint name is raised by 1 5) backup is executed and after it finishes, checkpoint file is re-written with appended new checkpoint.

The main problem here is that im unable to reproduce this situation, and i wont simply pull in some fixes before knowing the exact reasoning for what is happening here.

mortenmathiasen commented 1 year ago

Flushing is ok as all checkpoints appear in file listings. Maybe the issue is related to be that I do remote backup or that I am using OracleLinux?

abbbi commented 1 year ago

You are doing an remote backup, might be the reason i cant reproduce, this looks suspicous:

    checkpoints: List[str] = []                                                     
    if not lib.exists(args, cFile):                                                                                                                                                                                                                                              
        return checkpoints 

if sshClient connection is active, might be the case that it checks for the checkpoint file on the remote system, not locally and then responds with an empty checkpoint list

abbbi commented 1 year ago

can you check if the commit in mentioned branch fixes this issue for you?

mortenmathiasen commented 1 year ago

Just a moment ...

abbbi commented 1 year ago

just another, push, messed up arguments ;) branch issue89 should hopefully fix this issue

mortenmathiasen commented 1 year ago

Traceback (most recent call last): File "/bin/virtnbdbackup", line 1088, in main() File "/bin/virtnbdbackup", line 594, in main handleCheckpoints(args, virtClient, domObj) File "/bin/virtnbdbackup", line 158, in handleCheckpoints checkpoints: List[str] = readCheckpointFile(args, cptFile) File "/bin/virtnbdbackup", line 131, in readCheckpointFile if not lib.os.path.exists(args, cFile): TypeError: exists() takes 1 positional argument but 2 were given

abbbi commented 1 year ago

pull again, as noted in comment above

mortenmathiasen commented 1 year ago

Great! Branch fix solved the issue :-)