abbbi / virtnbdbackup

Backup utility for Libvirt / qemu / kvm supporting incremental and differential backups + instant recovery (agentless).
http://libvirtbackup.grinser.de/
GNU General Public License v3.0
330 stars 46 forks source link

Bug: Using IPv6 addresses instead of FQDNs fail on the NBD connection #150

Closed draggeta closed 10 months ago

draggeta commented 10 months ago

Version used

virtnbdbackup -V

1.9.49

Describe the bug When using IPV6 addresses for the --nbd-ip parameter, the NBD part of the connection fails:

# virtnbdbackup -U qemu+ssh://root@[fdcc::1]/system -d server -l auto -o /mnt/NFS/Backup/infra/server/2023-12 --nbd-ip fdcc::1

[2023-12-08 18:43:23] INFO nbd client - printVersion [vda]:  libnbd version: 1.14.2
[2023-12-08 18:43:23] INFO nbd client - connect [vda]:  Waiting until NBD server at [nbd://fdcc::1/vda] is up.
[2023-12-08 18:43:24] ERROR root virtnbdbackup - main [MainThread]:  Disk backup failed: [NBD endpoint: [TCP(exportName='vda', metaContext='qemu:dirty-bitmap:backup-vda', hostname='fdcc::1', tls=False, port=10809, backupSocket='')]: connection failed: [Unable to connect nbd server: nbd_connect_uri: unable to parse URI: nbd://fdcc::1:10809/vda: Invalid argument (EINVAL)]]
[2023-12-08 18:43:24] INFO root virtnbdbackup - main [MainThread]:  Backup jobs finished, stopping backup task.

I assume that the nbd://fdcc::1:10809/vda URI is the problem. It most likely should be nbd://[fdcc::1]:10809/vda to work correctly

Expected behavior A clear and concise description of what you expected to happen.

I would expect the connection to succeed in a similar fashion to when an IPv4 address is used.

Hypervisor information:

Logfiles: Please attach generated logfiles relevant to the reported issue.

Workaround: Share possible workarounds, if any.

None for this specifically. using FQDNs that point to an IPv6 AAAA record doesn't fail.

abbbi commented 10 months ago

you coud try if it works for you to specify the ip address in ipv6 notation: --nbd-ip "[fdcc::1]" the parameter is passed "as is" without any check if its an valid ipv4/6 address. Using FQDN and having setup correct forward/reverse dns is quite mandatory anyways:

https://github.com/abbbi/virtnbdbackup?tab=readme-ov-file#remote-backup

[..]
Before attempting an remote backup, please validate your environment meets the following criteria:

DNS resolution (forward and reverse) must work on all involved systems.
draggeta commented 10 months ago

I've done some more tests. The FQDN is AAAA only to make sure i'm not secretly falling back to IPv4:

  1. Pure IPv6 with brackets (failure):
    ~:# virtnbdbackup -U qemu+ssh://root@[fdcc::1]/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12 --nbd-ip [fdcc::1]
    [2023-12-08 22:06:42] INFO lib common - printVersion [MainThread]:  Version: 1.9.49 Arguments: /usr/local/bin/virtnbdbackup -U qemu+ssh://root@[fdcc::1]/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12 --nbd-ip [fdcc::1]
    [2023-12-08 22:06:42] INFO root virtnbdbackup - main [MainThread]:  Backup level: [auto]
    [2023-12-08 22:06:42] INFO root virtnbdbackup - main [MainThread]:  Backup mode auto: executing incremental backup.
    [2023-12-08 22:06:42] INFO virt client - _connect [MainThread]:  Connected to remote host: [hv01], local host: [backup01]
    [2023-12-08 22:06:42] INFO root virtnbdbackup - main [MainThread]:  Libvirt library version: [8000000]
    [2023-12-08 22:06:42] INFO root disktype - Optical [MainThread]:  Skipping attached [cdrom] device: [sda].
    [2023-12-08 22:06:42] INFO root virtnbdbackup - main [MainThread]:  Backup will save [1] attached disks.
    [2023-12-08 22:06:42] INFO root virtnbdbackup - main [MainThread]:  Concurrent backup processes: [1]
    [2023-12-08 22:06:42] INFO root checkpoint - create [MainThread]:  Loading checkpoints from: [/mnt/NFS/Backup/hv01/vm01/2023-12/vm01.cpt]
    [2023-12-08 22:06:42] INFO root checkpoint - redefine [MainThread]:  Loading checkpoint list from: [/mnt/NFS/Backup/hv01/vm01/2023-12/checkpoints]
    [2023-12-08 22:06:43] INFO root checkpoint - create [MainThread]:  Checkpoint handling.
    [2023-12-08 22:06:43] INFO root checkpoint - create [MainThread]:  Next checkpoint id: [5].
    [2023-12-08 22:06:43] INFO root checkpoint - create [MainThread]:  Parent checkpoint name [virtnbdbackup.4].
    [2023-12-08 22:06:43] INFO root checkpoint - create [MainThread]:  Using checkpoint name: [virtnbdbackup.5].
    [2023-12-08 22:06:43] INFO ssh client - connect [MainThread]:  Connecting remote system [hv01] via ssh, username: [root]
    [2023-12-08 22:06:43] INFO paramiko.transport transport - _log [Thread-1]:  Authentication (publickey) successful!
    [2023-12-08 22:06:43] INFO root virtnbdbackup - main [MainThread]:  Remote NBD Endpoint host: [hv01]
    [2023-12-08 22:06:43] INFO root virtnbdbackup - main [MainThread]:  Temporary scratch file target directory: [/var/tmp]
    [2023-12-08 22:06:43] INFO root virtnbdbackup - startBackupJob [MainThread]:  Starting backup job.
    [2023-12-08 22:06:43] WARNING fs fs - freeze [MainThread]:  Guest agent is not responding: QEMU guest agent is not connected
    [2023-12-08 22:06:43] ERROR root virtnbdbackup - startBackupJob [MainThread]:  Failed to start backup: [XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domainbackup.rng
    Extra element server in interleave
    Element domainbackup failed to validate content
    ]
  2. FQDN with IPv6 NBD (failure)
    ~:# virtnbdbackup -U qemu+ssh://root@hv01.domain.internal/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12 --nbd-ip [fdcc::1]
    [2023-12-08 22:07:44] INFO lib common - printVersion [MainThread]:  Version: 1.9.49 Arguments: /usr/local/bin/virtnbdbackup -U qemu+ssh://root@hv01.domain.internal/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12 --nbd-ip [fdcc::1]
    [2023-12-08 22:07:44] INFO root virtnbdbackup - main [MainThread]:  Backup level: [auto]
    [2023-12-08 22:07:44] INFO root virtnbdbackup - main [MainThread]:  Backup mode auto: executing incremental backup.
    [2023-12-08 22:07:45] INFO virt client - _connect [MainThread]:  Connected to remote host: [hv01], local host: [backup01]
    [2023-12-08 22:07:45] INFO root virtnbdbackup - main [MainThread]:  Libvirt library version: [8000000]
    [2023-12-08 22:07:45] INFO root disktype - Optical [MainThread]:  Skipping attached [cdrom] device: [sda].
    [2023-12-08 22:07:45] INFO root virtnbdbackup - main [MainThread]:  Backup will save [1] attached disks.
    [2023-12-08 22:07:45] INFO root virtnbdbackup - main [MainThread]:  Concurrent backup processes: [1]
    [2023-12-08 22:07:45] INFO root checkpoint - create [MainThread]:  Loading checkpoints from: [/mnt/NFS/Backup/hv01/vm01/2023-12/vm01.cpt]
    [2023-12-08 22:07:45] INFO root checkpoint - redefine [MainThread]:  Loading checkpoint list from: [/mnt/NFS/Backup/hv01/vm01/2023-12/checkpoints]
    [2023-12-08 22:07:45] INFO root checkpoint - create [MainThread]:  Checkpoint handling.
    [2023-12-08 22:07:45] INFO root checkpoint - create [MainThread]:  Next checkpoint id: [5].
    [2023-12-08 22:07:45] INFO root checkpoint - create [MainThread]:  Parent checkpoint name [virtnbdbackup.4].
    [2023-12-08 22:07:45] INFO root checkpoint - create [MainThread]:  Using checkpoint name: [virtnbdbackup.5].
    [2023-12-08 22:07:45] INFO ssh client - connect [MainThread]:  Connecting remote system [hv01] via ssh, username: [root]
    [2023-12-08 22:07:45] INFO paramiko.transport transport - _log [Thread-1]:  Authentication (publickey) successful!
    [2023-12-08 22:07:45] INFO root virtnbdbackup - main [MainThread]:  Remote NBD Endpoint host: [hv01]
    [2023-12-08 22:07:45] INFO root virtnbdbackup - main [MainThread]:  Temporary scratch file target directory: [/var/tmp]
    [2023-12-08 22:07:45] INFO root virtnbdbackup - startBackupJob [MainThread]:  Starting backup job.
    [2023-12-08 22:07:45] WARNING fs fs - freeze [MainThread]:  Guest agent is not responding: QEMU guest agent is not connected
    [2023-12-08 22:07:45] ERROR root virtnbdbackup - startBackupJob [MainThread]:  Failed to start backup: [XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domainbackup.rng
    Extra element server in interleave
    Element domainbackup failed to validate content
  3. FQDN and NBD FQDN (success):
    ~:# virtnbdbackup -U qemu+ssh://root@hv01.domain.internal/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12 --nbd-ip hv01.domain.internal
    [2023-12-08 22:08:08] INFO lib common - printVersion [MainThread]:  Version: 1.9.49 Arguments: /usr/local/bin/virtnbdbackup -U qemu+ssh://root@hv01.domain.internal/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12 --nbd-ip hv01.domain.internal
    [2023-12-08 22:08:08] INFO root virtnbdbackup - main [MainThread]:  Backup level: [auto]
    [2023-12-08 22:08:08] INFO root virtnbdbackup - main [MainThread]:  Backup mode auto: executing incremental backup.
    [2023-12-08 22:08:09] INFO virt client - _connect [MainThread]:  Connected to remote host: [hv01], local host: [backup01]
    [2023-12-08 22:08:09] INFO root virtnbdbackup - main [MainThread]:  Libvirt library version: [8000000]
    [2023-12-08 22:08:10] INFO root disktype - Optical [MainThread]:  Skipping attached [cdrom] device: [sda].
    [2023-12-08 22:08:10] INFO root virtnbdbackup - main [MainThread]:  Backup will save [1] attached disks.
    [2023-12-08 22:08:10] INFO root virtnbdbackup - main [MainThread]:  Concurrent backup processes: [1]
    [2023-12-08 22:08:10] INFO root checkpoint - create [MainThread]:  Loading checkpoints from: [/mnt/NFS/Backup/hv01/vm01/2023-12/vm01.cpt]
    [2023-12-08 22:08:10] INFO root checkpoint - redefine [MainThread]:  Loading checkpoint list from: [/mnt/NFS/Backup/hv01/vm01/2023-12/checkpoints]
    [2023-12-08 22:08:10] INFO root checkpoint - create [MainThread]:  Checkpoint handling.
    [2023-12-08 22:08:10] INFO root checkpoint - create [MainThread]:  Next checkpoint id: [5].
    [2023-12-08 22:08:10] INFO root checkpoint - create [MainThread]:  Parent checkpoint name [virtnbdbackup.4].
    [2023-12-08 22:08:10] INFO root checkpoint - create [MainThread]:  Using checkpoint name: [virtnbdbackup.5].
    [2023-12-08 22:08:10] INFO ssh client - connect [MainThread]:  Connecting remote system [hv01] via ssh, username: [root]
    [2023-12-08 22:08:10] INFO paramiko.transport transport - _log [Thread-1]:  Authentication (publickey) successful!
    [2023-12-08 22:08:10] INFO root virtnbdbackup - main [MainThread]:  Remote NBD Endpoint host: [hv01]
    [2023-12-08 22:08:10] INFO root virtnbdbackup - main [MainThread]:  Temporary scratch file target directory: [/var/tmp]
    [2023-12-08 22:08:10] INFO root virtnbdbackup - startBackupJob [MainThread]:  Starting backup job.
    [2023-12-08 22:08:10] WARNING fs fs - freeze [MainThread]:  Guest agent is not responding: QEMU guest agent is not connected
    [2023-12-08 22:08:10] INFO root virtnbdbackup - main [MainThread]:  Started backup job with checkpoint, saving information.
    [2023-12-08 22:08:10] INFO root checkpoint - backup [MainThread]:  Saving checkpoint config to: [/mnt/NFS/Backup/hv01/vm01/2023-12/checkpoints/virtnbdbackup.5.xml]
    [2023-12-08 22:08:11] INFO root context - get [vda]:  Using NBD meta context [qemu:dirty-bitmap:backup-vda]
    [2023-12-08 22:08:11] INFO nbd client - printVersion [vda]:  libnbd version: 1.14.2
    [2023-12-08 22:08:11] INFO nbd client - connect [vda]:  Waiting until NBD server at [nbd://hv01.domain.internal:10809/vda] is up.
    [2023-12-08 22:08:12] INFO nbd client - _getBlockInfo [vda]:  Using Maximum Block size supported by NBD server: [33554432]
    [2023-12-08 22:08:12] INFO nbd client - connect [vda]:  Connection to NBD backend succeeded.
    [2023-12-08 22:08:12] INFO root virtnbdbackup - backupDisk [vda]:  Got 107 extents to backup.
    [2023-12-08 22:08:12] INFO root virtnbdbackup - backupDisk [vda]:  26843545600 bytes disk size
    [2023-12-08 22:08:12] INFO root virtnbdbackup - backupDisk [vda]:  59965440 bytes of data extents to backup
    [2023-12-08 22:08:12] INFO root virtnbdbackup - openTargetFile [vda]:  Write data to target file: [/mnt/NFS/Backup/hv01/vm01/2023-12/vda.inc.virtnbdbackup.5.data.partial].
    [2023-12-08 22:08:13] INFO root virtnbdbackup - backupDisk [vda]:  Creating thin provisioned stream backup image
    [2023-12-08 22:08:15] INFO root virtnbdbackup - backupDisk [vda]:  Saving checksum to: [/mnt/NFS/Backup/hv01/vm01/2023-12/vda.inc.virtnbdbackup.5.data.chksum]
    [2023-12-08 22:08:15] INFO root virtnbdbackup - main [MainThread]:  Backup jobs finished, stopping backup task.
    [2023-12-08 22:08:15] INFO root metadata - backupConfig [MainThread]:  Saving VM config to: [/mnt/NFS/Backup/hv01/vm01/2023-12/vmconfig.virtnbdbackup.5.xml]
    [2023-12-08 22:08:15] INFO root metadata - backupDiskInfo [MainThread]:  Saved qcow image config to: [/mnt/NFS/Backup/hv01/vm01/2023-12/vda.virtnbdbackup.5.qcow.json]
    [2023-12-08 22:08:15] INFO root metadata - backupAutoStart [MainThread]:  Autostart setting configured for virtual machine.
    [2023-12-08 22:08:15] INFO paramiko.transport.sftp sftp - _log [MainThread]:  [chan 1] Opened sftp connection (server version 3)
    [2023-12-08 22:08:15] INFO paramiko.transport.sftp sftp - _log [MainThread]:  [chan 2] Opened sftp connection (server version 3)
    [2023-12-08 22:08:15] INFO paramiko.transport.sftp sftp - _log [MainThread]:  [chan 2] sftp session closed.
    [2023-12-08 22:08:15] INFO root virtnbdbackup - main [MainThread]:  Finished successfully
  4. FQDN no NBD (failure), probably a searchdomain issue on my side:
    ~:# virtnbdbackup -U qemu+ssh://root@hv01.domain.internal/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12
    [2023-12-08 22:08:45] INFO lib common - printVersion [MainThread]:  Version: 1.9.49 Arguments: /usr/local/bin/virtnbdbackup -U qemu+ssh://root@hv01.domain.internal/system -d vm01 -l auto -o /mnt/NFS/Backup/hv01/vm01/2023-12
    [2023-12-08 22:08:45] INFO root virtnbdbackup - main [MainThread]:  Backup level: [auto]
    [2023-12-08 22:08:45] INFO root virtnbdbackup - main [MainThread]:  Backup mode auto: executing incremental backup.
    [2023-12-08 22:08:47] INFO virt client - _connect [MainThread]:  Connected to remote host: [hv01], local host: [backup01]
    [2023-12-08 22:08:47] INFO root virtnbdbackup - main [MainThread]:  Libvirt library version: [8000000]
    [2023-12-08 22:08:47] INFO root disktype - Optical [MainThread]:  Skipping attached [cdrom] device: [sda].
    [2023-12-08 22:08:47] INFO root virtnbdbackup - main [MainThread]:  Backup will save [1] attached disks.
    [2023-12-08 22:08:47] INFO root virtnbdbackup - main [MainThread]:  Concurrent backup processes: [1]
    [2023-12-08 22:08:47] INFO root checkpoint - create [MainThread]:  Loading checkpoints from: [/mnt/NFS/Backup/hv01/vm01/2023-12/vm01.cpt]
    [2023-12-08 22:08:47] INFO root checkpoint - redefine [MainThread]:  Loading checkpoint list from: [/mnt/NFS/Backup/hv01/vm01/2023-12/checkpoints]
    [2023-12-08 22:08:47] INFO root checkpoint - create [MainThread]:  Checkpoint handling.
    [2023-12-08 22:08:47] INFO root checkpoint - create [MainThread]:  Next checkpoint id: [6].
    [2023-12-08 22:08:47] INFO root checkpoint - create [MainThread]:  Parent checkpoint name [virtnbdbackup.5].
    [2023-12-08 22:08:47] INFO root checkpoint - create [MainThread]:  Using checkpoint name: [virtnbdbackup.6].
    [2023-12-08 22:08:47] INFO ssh client - connect [MainThread]:  Connecting remote system [hv01] via ssh, username: [root]
    [2023-12-08 22:08:47] INFO paramiko.transport transport - _log [Thread-1]:  Authentication (publickey) successful!
    [2023-12-08 22:08:47] INFO root virtnbdbackup - main [MainThread]:  Remote NBD Endpoint host: [hv01]
    [2023-12-08 22:08:47] INFO root virtnbdbackup - main [MainThread]:  Temporary scratch file target directory: [/var/tmp]
    [2023-12-08 22:08:47] INFO root virtnbdbackup - startBackupJob [MainThread]:  Starting backup job.
    [2023-12-08 22:08:47] WARNING fs fs - freeze [MainThread]:  Guest agent is not responding: QEMU guest agent is not connected
    [2023-12-08 22:08:47] INFO root virtnbdbackup - main [MainThread]:  Started backup job with checkpoint, saving information.
    [2023-12-08 22:08:48] INFO root checkpoint - backup [MainThread]:  Saving checkpoint config to: [/mnt/NFS/Backup/hv01/vm01/2023-12/checkpoints/virtnbdbackup.6.xml]
    [2023-12-08 22:08:48] INFO root context - get [vda]:  Using NBD meta context [qemu:dirty-bitmap:backup-vda]
    [2023-12-08 22:08:48] INFO nbd client - printVersion [vda]:  libnbd version: 1.14.2
    [2023-12-08 22:08:48] INFO nbd client - connect [vda]:  Waiting until NBD server at [nbd://hv01:10809/vda] is up.
    [2023-12-08 22:08:49] ERROR root virtnbdbackup - main [MainThread]:  Disk backup failed: [NBD endpoint: [TCP(exportName='vda', metaContext='qemu:dirty-bitmap:backup-vda', hostname='hv01', tls=False, port=10809, backupSocket='')]: connection failed: [Unable to connect nbd server: nbd_connect_uri: recv: Connection refused (ECONNREFUSED)]]
    [2023-12-08 22:08:49] INFO root virtnbdbackup - main [MainThread]:  Backup jobs finished, stopping backup task.
    [2023-12-08 22:08:49] INFO root metadata - backupConfig [MainThread]:  Saving VM config to: [/mnt/NFS/Backup/hv01/vm01/2023-12/vmconfig.virtnbdbackup.6.xml]
    [2023-12-08 22:08:49] INFO root metadata - backupDiskInfo [MainThread]:  Saved qcow image config to: [/mnt/NFS/Backup/hv01/vm01/2023-12/vda.virtnbdbackup.6.qcow.json]
    [2023-12-08 22:08:49] INFO root metadata - backupAutoStart [MainThread]:  Autostart setting configured for virtual machine.
    [2023-12-08 22:08:49] ERROR root virtnbdbackup - main [MainThread]:  Error during backup

I would expect option 2 atleast to work, similar to how it works in IPv4. Number 4 fails, but I cannot explain why. Just tested it and the backup server can resolve the hostname perfectly to its IPv6 address and even connect using conn = libvirt.open("qemu+ssh://root@hv01/system")

abbbi commented 10 months ago
[2023-12-08 22:06:43] ERROR root virtnbdbackup - startBackupJob [MainThread]:  Failed to start backup: [XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domainbackup.rng
Extra element server in interleave

ok, using the ipv6 notation with [] in the argument doesn't work because it invalidates libvirts backup Xml schema definition.

So logic needs to be added here:

https://github.com/abbbi/virtnbdbackup/blob/master/libvirtnbdbackup/nbdcli/client.py#L47

that checks if an ipv6 address has been passed as argument and then encapsulate it in [] Itd go with proper FQDN instead of pushing logic here that is required only in corner cases.

abbbi commented 10 months ago

can you check if this change works for you?

https://github.com/abbbi/virtnbdbackup/compare/master...issue150

draggeta commented 10 months ago

Hi @abbbi, it sorta works. The change on line 63 should be if ip.version == 6: instead of if ip.version == "6":. After changing it into that it works in both scenario 1 and 2. 4 is still broken and I don't know why. Does NDB not use the searchdomain?

abbbi commented 10 months ago

Hi @abbbi, it sorta works. The change on line 63 should be if ip.version == 6: instead of if ip.version == "6":. After changing it into that it works in both scenario 1 and 2.

thanks, adjusted and merged master.

4 is still broken and I don't know why. Does NDB not use the searchdomain?

to be honest idk. Firewall that blocks the connection in place? you could analyse with debug option and -s parameter (only initializes ndb backend), and then check on the remote system on which ip address qemu starts the nbd service. Use -k option to kill the remote nbd service which is started for backup operation.

draggeta commented 10 months ago

@abbbi thank you for your work. I really appreciate it.

I still don't know why it fails as the nbd processes do start on IPv6 on the hypervisor, but if I don't provide the --nbd-ip option it fails. For me however, this is fine as I don't mind adding that parameter. I consider this issue fixed on our side.

abbbi commented 10 months ago

maybe it is an timing issue? Starting the NBD Service on the remote system might take some time, and maybe the first connection is attempted too fast.

Currently waiting for the NBD backend was only implemented for socket based local backups, i didnt experience remote NBD backends to take up some time. I now have fixed this and the implementation also attempts to retry the connection multiple times for remote backups:

See:

https://github.com/abbbi/virtnbdbackup/commit/96dd19005038562a32f698056c15b1ed09c39f42

draggeta commented 10 months ago

Well, I've tested it and it definitely isn't a timing issue. It seems to be a difference between the single name used when omitting --nbd-ip vs using the parameter with an FQDN/IPv6 address.

I don't know what the cause is, but this is what I've seen:

draggeta commented 10 months ago

Ah, I've finally got it. If you don't specify the name, it opens the process up only to localhost, not anything else. This is probably because the hv01 name points to localhost on the hypervisor itself via the /etc/hosts file.

qemu-syst 3189271    libvirt-qemu  118u  IPv6 1597636526      0t0  TCP [::1]:10809 (LISTEN)