Open gerba3 opened 1 year ago
Does it work with lxc copy --mode=push
?
In my mind the control connection should be opened using the subnet that both server1 and server2 share (192.168.3.0/24), especially because the API requests are already using that interface (lxc ls server2: works).
It looks like it has tried that:
- https://192.168.3.2:8443: Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake
But got an websocket error.
Can you run lxc monitor --pretty
on both source and destination hosts and then try again and supply the output on both systems please.
Does it work with
lxc copy --mode=push
?
yes, that one works just fine.
Can you run
lxc monitor --pretty
on both source and destination hosts and then try again and supply the output on both systems please.
Copy command used: lxc copy mycontainer/snap0 server2:mycontainer-snap0
# server1_pretty.log
time="2023-07-19T11:40:33Z" level=debug msg="Event listener server handler started" id=8ad4ead7-3d87-47dd-813a-3bb84dba1afb local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0 username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="GetInstanceUsage started" driver=zfs instance=mycontainer/snap0 pool=tank1 project=default
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/instances/mycontainer/snapshots/snap0 username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="GetInstanceUsage finished" driver=zfs instance=mycontainer/snap0 pool=tank1 project=default
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/instances/mycontainer username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/instances/mycontainer username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/events username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=POST protocol=unix url=/1.0/instances/mycontainer/snapshots/snap0 username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Event listener server handler started" id=6692e14f-d8fc-4558-a088-4af1d4583bb1 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-19T11:40:45Z" level=debug msg="New operation" class=websocket description="Transferring snapshot" operation=4649969f-295f-42ac-8fa6-a137bf5b0ea4 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: 4649969f-295f-42ac-8fa6-a137bf5b0ea4, Class: websocket, Description: Transferring snapshot" CreatedAt="2023-07-19 11:40:45.891098247 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[control:91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72 fs:8830124a9ceac2def6aaa8eb1944256a0d62544d01f58346116bc392bacc1122]" Resources="map[containers:[/1.0/instances/mycontainer] instances:[/1.0/instances/mycontainer] instances_snapshots:[/1.0/instances/mycontainer/snapshots/snap0]]" Status=Pending StatusCode=Pending UpdatedAt="2023-07-19 11:40:45.891098247 +0000 UTC"
time="2023-07-19T11:40:45Z" level=debug msg="Started operation" class=websocket description="Transferring snapshot" operation=4649969f-295f-42ac-8fa6-a137bf5b0ea4 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: 4649969f-295f-42ac-8fa6-a137bf5b0ea4, Class: websocket, Description: Transferring snapshot" CreatedAt="2023-07-19 11:40:45.891098247 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[control:91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72 fs:8830124a9ceac2def6aaa8eb1944256a0d62544d01f58346116bc392bacc1122]" Resources="map[containers:[/1.0/instances/mycontainer] instances:[/1.0/instances/mycontainer] instances_snapshots:[/1.0/instances/mycontainer/snapshots/snap0]]" Status=Running StatusCode=Running UpdatedAt="2023-07-19 11:40:45.891098247 +0000 UTC"
time="2023-07-19T11:40:45Z" level=info msg="Waiting for migration control connection on source" clusterMoveSourceName= instance=mycontainer/snap0 live=false project=default push=false
time="2023-07-19T11:40:55Z" level=debug msg="Failure for operation" class=websocket description="Transferring snapshot" err="Failed waiting for migration control connection on source: context deadline exceeded" operation=4649969f-295f-42ac-8fa6-a137bf5b0ea4 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: 4649969f-295f-42ac-8fa6-a137bf5b0ea4, Class: websocket, Description: Transferring snapshot" CreatedAt="2023-07-19 11:40:45.891098247 +0000 UTC" Err="Failed waiting for migration control connection on source: context deadline exceeded" Location=none MayCancel=false Metadata="map[control:91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72 fs:8830124a9ceac2def6aaa8eb1944256a0d62544d01f58346116bc392bacc1122]" Resources="map[containers:[/1.0/instances/mycontainer] instances:[/1.0/instances/mycontainer] instances_snapshots:[/1.0/instances/mycontainer/snapshots/snap0]]" Status=Failure StatusCode=Failure UpdatedAt="2023-07-19 11:40:45.891098247 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Allowing untrusted GET" ip="192.168.3.3:59306" url="/1.0/operations/4649969f-295f-42ac-8fa6-a137bf5b0ea4/websocket?secret=91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72"
time="2023-07-19T11:40:55Z" level=debug msg="Handling API request" ip=@ method=DELETE protocol=unix url=/1.0/operations/4649969f-295f-42ac-8fa6-a137bf5b0ea4 username=sysadmin
time="2023-07-19T11:40:55Z" level=debug msg="Event listener server handler stopped" listener=6692e14f-d8fc-4558-a088-4af1d4583bb1 local=/var/snap/lxd/common/lxd/unix.socket remote=@
# server2_pretty.log
time="2023-07-19T11:40:35Z" level=debug msg="Event listener server handler started" id=467ab80f-3d9c-42d2-a9b0-ba41d7d0fd24 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40566" method=GET protocol=tls url=/1.0 username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40576" method=GET protocol=tls url=/1.0/events username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Event listener server handler started" id=540123f7-76d3-436e-ae39-c649066c563f local="192.168.3.3:8443" remote="192.168.3.2:40576"
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40590" method=POST protocol=tls url=/1.0/instances username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Responding to instance create"
time="2023-07-19T11:40:45Z" level=debug msg="Instance operation lock created" action=create instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:45Z" level=info msg="Creating instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:45Z" level=debug msg="Adding device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:45Z" level=debug msg="Adding device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:45Z" level=info msg="Created instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:45Z" level=info msg="Action: instance-created, Source: /1.0/instances/mycontainer-snap0" location=none storage-pool=tank1 type=container
time="2023-07-19T11:40:45Z" level=debug msg="New operation" class=task description="Creating instance" operation=ca5f62a3-a216-4365-8b0f-67d51273deb7 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: ca5f62a3-a216-4365-8b0f-67d51273deb7, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:45.917102117 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Pending StatusCode=Pending UpdatedAt="2023-07-19 11:40:45.917102117 +0000 UTC"
time="2023-07-19T11:40:45Z" level=debug msg="Started operation" class=task description="Creating instance" operation=ca5f62a3-a216-4365-8b0f-67d51273deb7 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: ca5f62a3-a216-4365-8b0f-67d51273deb7, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:45.917102117 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Running StatusCode=Running UpdatedAt="2023-07-19 11:40:45.917102117 +0000 UTC"
time="2023-07-19T11:40:45Z" level=info msg="Waiting for migration control connection on target" clusterMoveSourceName= instance=mycontainer-snap0 live=false project=default push=false
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40602" method=GET protocol=tls url=/1.0/operations/ca5f62a3-a216-4365-8b0f-67d51273deb7 username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:55Z" level=debug msg="Instance operation lock finished" action=create err="Error transferring instance data: Failed waiting for migration control connection on target: Unable to connect to: 192.168.2.2:8443 ([dial tcp 192.168.2.2:8443: i/o timeout])" instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:55Z" level=debug msg="Failure for operation" class=task description="Creating instance" err="Error transferring instance data: Failed waiting for migration control connection on target: Unable to connect to: 192.168.2.2:8443 ([dial tcp 192.168.2.2:8443: i/o timeout])" operation=ca5f62a3-a216-4365-8b0f-67d51273deb7 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: ca5f62a3-a216-4365-8b0f-67d51273deb7, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:45.917102117 +0000 UTC" Err="Error transferring instance data: Failed waiting for migration control connection on target: Unable to connect to: 192.168.2.2:8443 ([dial tcp 192.168.2.2:8443: i/o timeout])" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Failure StatusCode=Failure UpdatedAt="2023-07-19 11:40:45.917102117 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:55Z" level=debug msg="Handling API request" ip="192.168.3.2:52072" method=POST protocol=tls url=/1.0/instances username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:55Z" level=debug msg="Responding to instance create"
time="2023-07-19T11:40:55Z" level=debug msg="Instance operation lock created" action=create instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:55Z" level=info msg="Creating instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:55Z" level=debug msg="Adding device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:55Z" level=debug msg="Adding device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:55Z" level=info msg="Created instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:55Z" level=info msg="Action: instance-created, Source: /1.0/instances/mycontainer-snap0" location=none storage-pool=tank1 type=container
time="2023-07-19T11:40:55Z" level=debug msg="New operation" class=task description="Creating instance" operation=a2ba0adf-d5fe-4278-b89d-d815f1832f60 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: a2ba0adf-d5fe-4278-b89d-d815f1832f60, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:55.97518561 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Pending StatusCode=Pending UpdatedAt="2023-07-19 11:40:55.97518561 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Started operation" class=task description="Creating instance" operation=a2ba0adf-d5fe-4278-b89d-d815f1832f60 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: a2ba0adf-d5fe-4278-b89d-d815f1832f60, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:55.97518561 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Running StatusCode=Running UpdatedAt="2023-07-19 11:40:55.97518561 +0000 UTC"
time="2023-07-19T11:40:55Z" level=info msg="Waiting for migration control connection on target" clusterMoveSourceName= instance=mycontainer-snap0 live=false project=default push=false
time="2023-07-19T11:40:55Z" level=debug msg="Instance operation lock finished" action=create err="Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake" instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:55Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:55Z" level=debug msg="Handling API request" ip="192.168.3.2:52082" method=GET protocol=tls url=/1.0/operations/a2ba0adf-d5fe-4278-b89d-d815f1832f60 username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:55Z" level=debug msg="Failure for operation" class=task description="Creating instance" err="Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake" operation=a2ba0adf-d5fe-4278-b89d-d815f1832f60 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: a2ba0adf-d5fe-4278-b89d-d815f1832f60, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:55.97518561 +0000 UTC" Err="Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Failure StatusCode=Failure UpdatedAt="2023-07-19 11:40:55.97518561 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Event listener server handler stopped" listener=540123f7-76d3-436e-ae39-c649066c563f local="192.168.3.3:8443" remote="192.168.3.2:40576"
@gabrielmougard want to have a go with this one?
@gerba3 @tomponline I'm trying to reproduce this scenario:
Here are my two servers (server1
and server2
, each one configured with LXD)
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| server1 | RUNNING | 10.156.3.2 (enp7s0) | fd42:dd83:4885:8d24:216:3eff:fe0b:e98d (enp5s0) | VIRTUAL-MACHINE | 0 |
| | | 10.156.2.2 (enp6s0) | | | |
| | | 10.156.162.200 (lxdbr0) | | | |
| | | 10.156.162.100 (enp5s0) | | | |
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| server2 | RUNNING | 10.156.3.3 (enp7s0) | fd42:dd83:4885:8d24:216:3eff:fe2c:2a1d (enp5s0) | VIRTUAL-MACHINE | 0 |
| | | 10.156.1.2 (enp6s0) | | | |
| | | 10.156.162.200 (lxdbr0) | | | |
| | | 10.156.162.101 (enp5s0) | | | |
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
and here are my remotes on server1
and server2
respectively:
output of `lxc remote list` on `server1`:
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| NAME | URL | PROTOCOL | AUTH TYPE | PUBLIC | STATIC | GLOBAL |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| 10.156.3.3 | https://10.156.3.3:8443 | lxd | tls | NO | NO | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| images | https://images.linuxcontainers.org | simplestreams | none | YES | NO | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| local (current) | unix:// | lxd | file access | NO | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| server2 | https://server2:8443 | lxd | tls | NO | NO | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu | https://cloud-images.ubuntu.com/releases | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-daily | https://cloud-images.ubuntu.com/daily | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal | https://cloud-images.ubuntu.com/minimal/releases/ | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal-daily | https://cloud-images.ubuntu.com/minimal/daily/ | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
output of `ip r` on `server1`:
default via 10.156.162.1 dev enp5s0 proto static
10.156.162.0/24 dev lxdbr0 proto kernel scope link src 10.156.162.200
10.156.162.0/24 dev enp5s0 proto kernel scope link src 10.156.162.100
10.156.2.0/24 dev enp6s0 proto kernel scope link src 10.156.2.2
10.156.3.0/24 dev enp7s0 proto kernel scope link src 10.156.3.2
and
output of `lxc remote list` on `server2`:
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| NAME | URL | PROTOCOL | AUTH TYPE | PUBLIC | STATIC | GLOBAL |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| 10.156.3.2 | https://10.156.3.2:8443 | lxd | tls | NO | NO | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| images | https://images.linuxcontainers.org | simplestreams | none | YES | NO | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| local (current) | unix:// | lxd | file access | NO | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| server1 | https://server1:8443 | lxd | tls | NO | NO | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu | https://cloud-images.ubuntu.com/releases | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-daily | https://cloud-images.ubuntu.com/daily | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal | https://cloud-images.ubuntu.com/minimal/releases/ | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal-daily | https://cloud-images.ubuntu.com/minimal/daily/ | simplestreams | none | YES | YES | NO |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
output of `ip r` on `server2`:
default via 10.156.162.1 dev enp5s0 proto static
10.156.162.0/24 dev lxdbr0 proto kernel scope link src 10.156.162.200
10.156.162.0/24 dev enp5s0 proto kernel scope link src 10.156.162.101
10.156.1.0/24 dev enp6s0 proto kernel scope link src 10.156.1.2
10.156.3.0/24 dev enp7s0 proto kernel scope link src 10.156.3.3
Now, from the look of it, I think this should be similar to what you have (please tell me if I made a mistake in reproducing this environment).
Then, when attempting to create a container on server1 and send its snapshot to server2 like:
lxc launch images:ubuntu/jammy mycontainer
lxc copy mycontainer/snap0 server2:mycontainer-snap0
it works fine on my side (so far I didn't try the lxc storage volume copy server1:tank1/myvol tank1/myvol
command) but for now my container uses the default
pool. So I'll try with a custom one and see what I have.
Here is my LXD info:
config:
core.https_address: :8443
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
addresses:
- 10.156.162.100:8443
- '[fd42:dd83:4885:8d24:216:3eff:fe0b:e98d]:8443'
- 10.156.2.2:8443
- 10.156.3.2:8443
- 10.156.162.200:8443
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIIB5zCCAWygAwIBAgIQRFut3WfbsqG3azSY0EFTFTAKBggqhkjOPQQDAzAlMQww
CgYDVQQKEwNMWEQxFTATBgNVBAMMDHJvb3RAc2VydmVyMTAeFw0yMzEwMTgxNDMw
MzZaFw0zMzEwMTUxNDMwMzZaMCUxDDAKBgNVBAoTA0xYRDEVMBMGA1UEAwwMcm9v
dEBzZXJ2ZXIxMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEdTqbfSQ9v0QCKCQjKHtM
fMC4V8Th+kblPbLexKVt3g0meRzUJHU2E8mDeFUF7VeUs1DNZUWtKFfsoSU867vX
fLSr2P6hWetPEe7r12t/71E6XFnDBr/roW7xHv3h0t/Po2EwXzAOBgNVHQ8BAf8E
BAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAqBgNVHREE
IzAhggdzZXJ2ZXIxhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMD
A2kAMGYCMQDiXIWMVYOQKkcQG/Eh4n9byQKl8Yp7CRB8OgHxpJOgxxWnCY9H8P8S
c7bTv+yipzYCMQCC0RumBgoACJHlNcVtD/K2kDrumIGy499dEdyCFLKo4RDYQqd+
Wz3rh4/+sxHfeGc=
-----END CERTIFICATE-----
certificate_fingerprint: f9f13a357fd976d72f99849a3b6bc52f2a6d09697c70830ca6fb8df805ef69d0
driver: lxc | qemu
driver_version: 5.0.3 | 8.0.4
firewall: nftables
kernel: Linux
kernel_architecture: x86_64
kernel_features:
idmapped_mounts: "true"
netnsid_getifaddrs: "true"
seccomp_listener: "true"
seccomp_listener_continue: "true"
shiftfs: "false"
uevent_injection: "true"
unpriv_fscaps: "true"
kernel_version: 5.15.0-86-generic
lxc_features:
cgroup2: "true"
core_scheduling: "true"
devpts_fd: "true"
idmapped_mounts_v2: "true"
mount_injection_file: "true"
network_gateway_device_route: "true"
network_ipvlan: "true"
network_l2proxy: "true"
network_phys_macvlan_mtu: "true"
network_veth_router: "true"
pidfd: "true"
seccomp_allow_deny_syntax: "true"
seccomp_notify: "true"
seccomp_proxy_send_notify_fd: "true"
os_name: Ubuntu
os_version: "22.04"
project: default
server: lxd
server_clustered: false
server_event_mode: full-mesh
server_name: server1
server_pid: 2656
server_version: "5.18"
storage: dir
storage_version: "1"
storage_supported_drivers:
- name: btrfs
version: 5.16.2
remote: false
- name: ceph
version: 17.2.6
remote: true
- name: cephfs
version: 17.2.6
remote: true
- name: cephobject
version: 17.2.6
remote: true
- name: dir
version: "1"
remote: false
- name: lvm
version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0
remote: false
- name: zfs
version: 2.1.5-1ubuntu6~22.04.1
remote: false
Hi, thanks for the response and the attempt to reproduce.
I tested it myself again using the setup from then and then upgraded to the latest release using snap refresh. The issue still occurs.
One difference is that I am using containers and not VMs - I don't believe this to be significant in this instance. I think the launch and copy procedure is missing the snapshot command but once again, not significant. Further differences I found are the storage driver used (zfs vs. dir) and that you set it up using root, whereas I did it using the primary user (UID 1000). This should not matter.
I do not use any IPv6 configuration on my system. Does server2 resolve to the IPv6 address in your setup? I think it is possible that this avoids the issue entirely.
I thought that maybe selecting the remote in the copy command to be the IPv4 address might help but I figured that the underlying issue is with how the copy command handles the process. When I run the copy using the following command I get valuable debug logs:
lxc copy -v --debug mycontainer/snap0 server2:mycontainer-snap0
Output: debug_log_copy_command.txt
My hypothesis is that (as seen in the debug log above) the command gets the information of both instances (especially the addresses part) and then attempts to use the first network address that is listed on server1 for the transfer (/sends this address with the request to server2 that it should use that address for the transfer). This would explain why the transfer works in your setup: It bypasses the two local links entirely.
Snippet from the Output above:
DEBUG [2023-10-19T07:57:23Z] Sending request to LXD etag= method=POST url="https://server2:8443/1.0/instances"
DEBUG [2023-10-19T07:57:23Z]
{
"architecture": "x86_64",
"config": {
"image.architecture": "amd64",
"image.description": "ubuntu 22.04 LTS amd64 (release) (20230719)",
"image.label": "release",
"image.os": "ubuntu",
"image.release": "jammy",
"image.serial": "20230719",
"image.type": "squashfs",
"image.version": "22.04",
"volatile.base_image": "a0a9b9976255e7235afe495e920e6c0f40f55ae22852a5d5c31139aa9408f2e5"
},
"devices": {},
"ephemeral": false,
"profiles": [
"default"
],
"stateful": false,
"description": "",
"name": "mycontainer-snap0",
"source": {
"type": "migration",
"certificate": "-----BEGIN CERTIFICATE-----\nMIICBjCCAY2gAwIBAgIRAKeMoAWfAt+1hBN4RjqxIN0wCgYIKoZIzj0EAwMwNTEc\nMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEVMBMGA1UEAwwMcm9vdEBzZXJ2\nZXIxMB4XDTIzMDcxOTA5MTcwMVoXDTMzMDcxNjA5MTcwMVowNTEcMBoGA1UEChMT\nbGludXhjb250YWluZXJzLm9yZzEVMBMGA1UEAwwMcm9vdEBzZXJ2ZXIxMHYwEAYH\nKoZIzj0CAQYFK4EEACIDYgAEUSi2P7EzLy6dRRm1DfTOy948C/fR83FfzYJ5PeCd\ne4bgkSrc9/agW7au8x6IBI/vCGKvYtYULjpDMBntgtu8v8m5/ShiPiiHPBy0NOLi\nN+7mjKV3XV9f/4k/r1cnDxlho2EwXzAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAww\nCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAqBgNVHREEIzAhggdzZXJ2ZXIxhwR/\nAAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2cAMGQCMEwbQozhiX95\n5WRsyHsjHczwj88zrpGfgQSYn8EnPE7xuJUSFed7jHXfcHU4qaOgRQIwOCxXX/ir\nbhKCigbhhuHUJRE9dXH4DjcX9xiFz410CFWRL0suJu6Mwh4Kd95E6cdy\n-----END CERTIFICATE-----\n",
"base-image": "a0a9b9976255e7235afe495e920e6c0f40f55ae22852a5d5c31139aa9408f2e5",
"mode": "pull",
"operation": "https://192.168.2.2:8443/1.0/operations/4de3d79f-b8f2-4f26-95e1-c6d7fad014c0",
"secrets": {
"control": "b488251fab9d62e6d8af2a1b7544d7d685edc393af879a322a2bef20cf1cd4d9",
"fs": "917b6e940e6ea824ec9150c231efa4c8a3ef17a0b8777131fc7c604fb792e0fe"
},
"allow_inconsistent": false
},
"instance_type": "",
"type": "container"
}
I am not sure from the logs if this is the payload of the POST request or the response.
I switched my network configuration to this and the transfer works now. The list in the lxc info
output uses the same order in which the interfaces are listed with ip addr
based on the device number.
# server1
$ cat /etc/netplan/00-installer-config.yaml
network:
ethernets:
enp1s0:
addresses:
- 192.168.3.2/24
nameservers:
addresses: []
search: []
enp7s0:
addresses:
- 192.168.2.2/24
nameservers:
addresses: []
search: []
enp8s0:
dhcp4: false
addresses:
- 192.168.122.100/24
gateway4: 192.168.122.1
nameservers:
addresses: [1.1.1.1]
search: []
version: 2
# server2
$ cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
ethernets:
enp1s0:
addresses:
- 192.168.3.3/24
nameservers:
addresses: []
search: []
enp7s0:
addresses:
- 192.168.1.2/24
nameservers:
addresses: []
search: []
enp8s0:
dhcp4: false
addresses:
- 192.168.122.101/24
gateway4: 192.168.122.1
nameservers:
addresses: [1.1.1.1]
search: []
version: 2
The output of the lxc info
command on server1
for the addresses:
environment:
addresses:
- 192.168.3.2:8443
- 192.168.2.2:8443
- 192.168.122.100:8443
- 192.168.122.200:8443
@gerba3 thanks for the quick response. Regarding IPv6, I removed my allocated IPv6 address and it does not change the result. This ordering issue is interesting. I'll look into that.
@gerba3 @tomponline I think I might have an idea on what's going on.. When attempting to copy (in pull mode), the function func (*ProtocolLXD) tryCreateInstance(req api.InstancesPost, urls []string, op Operation) (RemoteOperation, error)
is called. Then, we'll iterate over the different addresses registered as LXD server : in your case it'll be [192.168.2.2:8443, 192.168.3.2:8443]
. The order matters here and I'll try to explain why. We'll first try to create the instance based on the source LXD server at 192.168.2.2:8443
. This one does not exist, but the func (*ProtocolLXD).CreateInstance(instance api.InstancesPost) (Operation, error)
will not fail yet, but its operation will. Why is that ?
Well, in the func createFromMigration(s *state.State, r *http.Request, projectName string, profiles []api.Profile, req *api.InstancesPost) response.Response
function, when the migrationSink
is created, the underlying websocket dialer is not called but if it was, then it'll return an error before the operation starts and the tryCreateInstance
function will call CreateInstance
on the next address and be successful. Therefore, the operation fails because it cannot reach the non-existant LXD server as expected. And because a failing operation breaks the tryCreateInstance
loop and does not continue (there must be a good reason for it. @tomponline do you know why ?), I suggest we add the following in the createFromMigration
function:
sink, err := newMigrationSink(&migrationArgs)
if err != nil {
return response.InternalError(err)
}
// NEW: Check that the source server is reachable before starting the operation, so that the client is able to try the next address.
_, _, err = dialer.DialContext(r.Context(), req.Source.Operation, http.Header{})
if err != nil {
return response.InternalError(err)
}
in order to return early and avoid this issue. What do you think ?
@gabrielmougard as discussed please let me know if you find the issue is because the source is timing out waiting for the target to connect to it as it iterates the various IPs offered to it from the source via the client. Thanks
Required information
Distribution: Ubuntu
Distribution version: 22.04
The output of
lxc info
on server1:The output of
lxc info
on server2:Output of
lxc remote ls
on server1:Output of
lxc remote ls
on server2:Issue description
The setup I am using consists of two LXC hosts that are running independently (not in a cluster!).
server1
is an active node that is running a single containermycontainer
andserver2
has a copy ofmycontainer
that is generally stopped. In regular intervalsmycontainer
and its volumes are copied over toserver2
as a kind of "cold standby" using a snapshot and renaming it after the lxc copy has finished. Backups are handled differently and are not relevant here.Notes:
lxc list server1:
work for the remote server192.168.2.0/24
onserver1
and192.168.1.0/24
onserver2
are directly attached links to a backup server (in the setup described below just a dummy interface)Given a setup with multiple interfaces, that are all set to listen on port 8443 for remote operations the following issue arises:
When trying to copy a container snapshot (or copy a storage volume) of a container to a remote LXD instance using
lxc copy
/lxc storage volume copy
the transfer fails.Example: Copying a container (or a storage volume) to the remote using one of these commands:
lxc copy mycontainer/snap0 server2:mycontainer-snap0
lxc storage volume copy tank1/myvol server2:tank1/myvol
fails with:
Error: Failed instance creation:
I expect this copy operation to work flawlessly but it seems that the operation tries to open a control connection from
server2
toserver1
using the target address192.168.2.2:8443
which is an address not reachable fromserver2
. In my mind the control connection should be opened using the subnet that bothserver1
andserver2
share (192.168.3.0/24
), especially because the API requests are already using that interface (lxc ls server2:
works).Executing the copy command from server2 (as a pull operation) however works flawlessly:
lxc copy server1:mycontainer/snap0 mycontainer-snap0
lxc storage volume copy server1:tank1/myvol tank1/myvol
A dirty fix I found for this problem is simply taking down the interface to the
backup-server
(uses the subnet 192.168.2.0/24) onserver1
and taking it up again after the copy operation has finished. I am not happy with this "fix".Steps to reproduce
server1
andserver2
with network configurations like this (enp1s0 are DAC link to the backup server):server2
$ cat /etc/netplan/00-installer-config.yaml network: ethernets: enp1s0: addresses:
192.168.3.3 server2
to/etc/hosts
192.168.3.2 server1
to/etc/hosts
lxc config trust add
onserver1
lxc config trust add
onserver2
lxc remote add 192.168.3.3
onserver1
, use token from step 4lxc remote add 192.168.3.2
onserver2
, use token from step 5mycontainer
on server1 (optionally also with a storage volume but same error occurs there)lxc snapshot mycontainer
lxc copy mycontainer/snap0 server2:mycontainer-snap0
Error: Failed instance creation:
Information to attach
lxc info mycontainer --show-log
)lxc config show mycontainer --expanded
)lxc monitor
while reproducing the issue (onserver1
andserver2
)lxc_config_mycontainer.txt lxc_info.txt monitor_log_server1.txt monitor_log_server2.txt