canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

`lxc copy` fails when LXD is listening on multiple interfaces #12042

Open gerba3 opened 1 year ago

gerba3 commented 1 year ago

Required information

Output of lxc remote ls on server1:

+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
|      NAME       |                   URL                    |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC | GLOBAL |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| 192.168.3.3     | https://192.168.3.3:8443                 | lxd           | tls         | NO     | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| images          | https://images.linuxcontainers.org       | simplestreams | none        | YES    | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| local (current) | unix://                                  | lxd           | file access | NO     | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| server2         | https://server2:8443                     | lxd           | tls         | NO     | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | none        | YES    | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | none        | YES    | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+

Output of lxc remote ls on server2:

+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
|      NAME       |                   URL                    |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC | GLOBAL |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| 192.168.3.2     | https://192.168.3.2:8443                 | lxd           | tls         | NO     | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| images          | https://images.linuxcontainers.org       | simplestreams | none        | YES    | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| local (current) | unix://                                  | lxd           | file access | NO     | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| server1         | https://server1:8443                     | lxd           | tls         | NO     | NO     | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | none        | YES    | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | none        | YES    | YES    | NO     |
+-----------------+------------------------------------------+---------------+-------------+--------+--------+--------+

Issue description

The setup I am using consists of two LXC hosts that are running independently (not in a cluster!). server1 is an active node that is running a single container mycontainer and server2 has a copy of mycontainer that is generally stopped. In regular intervals mycontainer and its volumes are copied over to server2 as a kind of "cold standby" using a snapshot and renaming it after the lxc copy has finished. Backups are handled differently and are not relevant here.

Notes:

Given a setup with multiple interfaces, that are all set to listen on port 8443 for remote operations the following issue arises:

When trying to copy a container snapshot (or copy a storage volume) of a container to a remote LXD instance using lxc copy/lxc storage volume copy the transfer fails.

Example: Copying a container (or a storage volume) to the remote using one of these commands:

lxc copy mycontainer/snap0 server2:mycontainer-snap0 lxc storage volume copy tank1/myvol server2:tank1/myvol

fails with:

Error: Failed instance creation:

I expect this copy operation to work flawlessly but it seems that the operation tries to open a control connection from server2 to server1 using the target address 192.168.2.2:8443 which is an address not reachable from server2. In my mind the control connection should be opened using the subnet that both server1 and server2 share (192.168.3.0/24), especially because the API requests are already using that interface (lxc ls server2: works).

Executing the copy command from server2 (as a pull operation) however works flawlessly:

lxc copy server1:mycontainer/snap0 mycontainer-snap0 lxc storage volume copy server1:tank1/myvol tank1/myvol

A dirty fix I found for this problem is simply taking down the interface to the backup-server (uses the subnet 192.168.2.0/24) on server1 and taking it up again after the copy operation has finished. I am not happy with this "fix".

Steps to reproduce

  1. Set up two systems, server1 and server2 with network configurations like this (enp1s0 are DAC link to the backup server):
    
    # server1
    $ cat /etc/netplan/00-installer-config.yaml
    network:
    ethernets:
    enp1s0:
      addresses:
      - 192.168.2.2/24
      nameservers:
        addresses: []
        search: []
    enp7s0:
      addresses:
      - 192.168.3.2/24
      nameservers:
        addresses: []
        search: []
    enp8s0:
      dhcp4: false
      addresses:
      - 192.168.122.100/24
      gateway4: 192.168.122.1
      nameservers:
        addresses: [1.1.1.1]
        search: []
    version: 2

server2

$ cat /etc/netplan/00-installer-config.yaml network: ethernets: enp1s0: addresses:

Error: Failed instance creation:

Information to attach

lxc_config_mycontainer.txt lxc_info.txt monitor_log_server1.txt monitor_log_server2.txt

tomponline commented 1 year ago

Does it work with lxc copy --mode=push?

tomponline commented 1 year ago

In my mind the control connection should be opened using the subnet that both server1 and server2 share (192.168.3.0/24), especially because the API requests are already using that interface (lxc ls server2: works).

It looks like it has tried that:

  • https://192.168.3.2:8443: Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake

But got an websocket error.

Can you run lxc monitor --pretty on both source and destination hosts and then try again and supply the output on both systems please.

gerba3 commented 1 year ago

Does it work with lxc copy --mode=push?

yes, that one works just fine.

Can you run lxc monitor --pretty on both source and destination hosts and then try again and supply the output on both systems please.

Copy command used: lxc copy mycontainer/snap0 server2:mycontainer-snap0

# server1_pretty.log 
time="2023-07-19T11:40:33Z" level=debug msg="Event listener server handler started" id=8ad4ead7-3d87-47dd-813a-3bb84dba1afb local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0 username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="GetInstanceUsage started" driver=zfs instance=mycontainer/snap0 pool=tank1 project=default
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/instances/mycontainer/snapshots/snap0 username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="GetInstanceUsage finished" driver=zfs instance=mycontainer/snap0 pool=tank1 project=default
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/instances/mycontainer username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/instances/mycontainer username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/events username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip=@ method=POST protocol=unix url=/1.0/instances/mycontainer/snapshots/snap0 username=sysadmin
time="2023-07-19T11:40:45Z" level=debug msg="Event listener server handler started" id=6692e14f-d8fc-4558-a088-4af1d4583bb1 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-19T11:40:45Z" level=debug msg="New operation" class=websocket description="Transferring snapshot" operation=4649969f-295f-42ac-8fa6-a137bf5b0ea4 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: 4649969f-295f-42ac-8fa6-a137bf5b0ea4, Class: websocket, Description: Transferring snapshot" CreatedAt="2023-07-19 11:40:45.891098247 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[control:91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72 fs:8830124a9ceac2def6aaa8eb1944256a0d62544d01f58346116bc392bacc1122]" Resources="map[containers:[/1.0/instances/mycontainer] instances:[/1.0/instances/mycontainer] instances_snapshots:[/1.0/instances/mycontainer/snapshots/snap0]]" Status=Pending StatusCode=Pending UpdatedAt="2023-07-19 11:40:45.891098247 +0000 UTC"
time="2023-07-19T11:40:45Z" level=debug msg="Started operation" class=websocket description="Transferring snapshot" operation=4649969f-295f-42ac-8fa6-a137bf5b0ea4 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: 4649969f-295f-42ac-8fa6-a137bf5b0ea4, Class: websocket, Description: Transferring snapshot" CreatedAt="2023-07-19 11:40:45.891098247 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[control:91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72 fs:8830124a9ceac2def6aaa8eb1944256a0d62544d01f58346116bc392bacc1122]" Resources="map[containers:[/1.0/instances/mycontainer] instances:[/1.0/instances/mycontainer] instances_snapshots:[/1.0/instances/mycontainer/snapshots/snap0]]" Status=Running StatusCode=Running UpdatedAt="2023-07-19 11:40:45.891098247 +0000 UTC"
time="2023-07-19T11:40:45Z" level=info msg="Waiting for migration control connection on source" clusterMoveSourceName= instance=mycontainer/snap0 live=false project=default push=false
time="2023-07-19T11:40:55Z" level=debug msg="Failure for operation" class=websocket description="Transferring snapshot" err="Failed waiting for migration control connection on source: context deadline exceeded" operation=4649969f-295f-42ac-8fa6-a137bf5b0ea4 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: 4649969f-295f-42ac-8fa6-a137bf5b0ea4, Class: websocket, Description: Transferring snapshot" CreatedAt="2023-07-19 11:40:45.891098247 +0000 UTC" Err="Failed waiting for migration control connection on source: context deadline exceeded" Location=none MayCancel=false Metadata="map[control:91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72 fs:8830124a9ceac2def6aaa8eb1944256a0d62544d01f58346116bc392bacc1122]" Resources="map[containers:[/1.0/instances/mycontainer] instances:[/1.0/instances/mycontainer] instances_snapshots:[/1.0/instances/mycontainer/snapshots/snap0]]" Status=Failure StatusCode=Failure UpdatedAt="2023-07-19 11:40:45.891098247 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Allowing untrusted GET" ip="192.168.3.3:59306" url="/1.0/operations/4649969f-295f-42ac-8fa6-a137bf5b0ea4/websocket?secret=91d4cc9cb21e5157be6bb62e3e1cf85080a7b84fd7f2b4e4fa221caea4209d72"
time="2023-07-19T11:40:55Z" level=debug msg="Handling API request" ip=@ method=DELETE protocol=unix url=/1.0/operations/4649969f-295f-42ac-8fa6-a137bf5b0ea4 username=sysadmin
time="2023-07-19T11:40:55Z" level=debug msg="Event listener server handler stopped" listener=6692e14f-d8fc-4558-a088-4af1d4583bb1 local=/var/snap/lxd/common/lxd/unix.socket remote=@
# server2_pretty.log 
time="2023-07-19T11:40:35Z" level=debug msg="Event listener server handler started" id=467ab80f-3d9c-42d2-a9b0-ba41d7d0fd24 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40566" method=GET protocol=tls url=/1.0 username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40576" method=GET protocol=tls url=/1.0/events username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Event listener server handler started" id=540123f7-76d3-436e-ae39-c649066c563f local="192.168.3.3:8443" remote="192.168.3.2:40576"
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40590" method=POST protocol=tls url=/1.0/instances username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Responding to instance create"
time="2023-07-19T11:40:45Z" level=debug msg="Instance operation lock created" action=create instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:45Z" level=info msg="Creating instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:45Z" level=debug msg="Adding device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:45Z" level=debug msg="Adding device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:45Z" level=info msg="Created instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:45Z" level=info msg="Action: instance-created, Source: /1.0/instances/mycontainer-snap0" location=none storage-pool=tank1 type=container
time="2023-07-19T11:40:45Z" level=debug msg="New operation" class=task description="Creating instance" operation=ca5f62a3-a216-4365-8b0f-67d51273deb7 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: ca5f62a3-a216-4365-8b0f-67d51273deb7, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:45.917102117 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Pending StatusCode=Pending UpdatedAt="2023-07-19 11:40:45.917102117 +0000 UTC"
time="2023-07-19T11:40:45Z" level=debug msg="Started operation" class=task description="Creating instance" operation=ca5f62a3-a216-4365-8b0f-67d51273deb7 project=default
time="2023-07-19T11:40:45Z" level=info msg="ID: ca5f62a3-a216-4365-8b0f-67d51273deb7, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:45.917102117 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Running StatusCode=Running UpdatedAt="2023-07-19 11:40:45.917102117 +0000 UTC"
time="2023-07-19T11:40:45Z" level=info msg="Waiting for migration control connection on target" clusterMoveSourceName= instance=mycontainer-snap0 live=false project=default push=false
time="2023-07-19T11:40:45Z" level=debug msg="Handling API request" ip="192.168.3.2:40602" method=GET protocol=tls url=/1.0/operations/ca5f62a3-a216-4365-8b0f-67d51273deb7 username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:45Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:55Z" level=debug msg="Instance operation lock finished" action=create err="Error transferring instance data: Failed waiting for migration control connection on target: Unable to connect to: 192.168.2.2:8443 ([dial tcp 192.168.2.2:8443: i/o timeout])" instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:55Z" level=debug msg="Failure for operation" class=task description="Creating instance" err="Error transferring instance data: Failed waiting for migration control connection on target: Unable to connect to: 192.168.2.2:8443 ([dial tcp 192.168.2.2:8443: i/o timeout])" operation=ca5f62a3-a216-4365-8b0f-67d51273deb7 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: ca5f62a3-a216-4365-8b0f-67d51273deb7, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:45.917102117 +0000 UTC" Err="Error transferring instance data: Failed waiting for migration control connection on target: Unable to connect to: 192.168.2.2:8443 ([dial tcp 192.168.2.2:8443: i/o timeout])" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Failure StatusCode=Failure UpdatedAt="2023-07-19 11:40:45.917102117 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:55Z" level=debug msg="Handling API request" ip="192.168.3.2:52072" method=POST protocol=tls url=/1.0/instances username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:55Z" level=debug msg="Responding to instance create"
time="2023-07-19T11:40:55Z" level=debug msg="Instance operation lock created" action=create instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:55Z" level=info msg="Creating instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:55Z" level=debug msg="Adding device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:55Z" level=debug msg="Adding device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:55Z" level=info msg="Created instance" ephemeral=false instance=mycontainer-snap0 instanceType=container project=default
time="2023-07-19T11:40:55Z" level=info msg="Action: instance-created, Source: /1.0/instances/mycontainer-snap0" location=none storage-pool=tank1 type=container
time="2023-07-19T11:40:55Z" level=debug msg="New operation" class=task description="Creating instance" operation=a2ba0adf-d5fe-4278-b89d-d815f1832f60 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: a2ba0adf-d5fe-4278-b89d-d815f1832f60, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:55.97518561 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Pending StatusCode=Pending UpdatedAt="2023-07-19 11:40:55.97518561 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Started operation" class=task description="Creating instance" operation=a2ba0adf-d5fe-4278-b89d-d815f1832f60 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: a2ba0adf-d5fe-4278-b89d-d815f1832f60, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:55.97518561 +0000 UTC" Err= Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Running StatusCode=Running UpdatedAt="2023-07-19 11:40:55.97518561 +0000 UTC"
time="2023-07-19T11:40:55Z" level=info msg="Waiting for migration control connection on target" clusterMoveSourceName= instance=mycontainer-snap0 live=false project=default push=false
time="2023-07-19T11:40:55Z" level=debug msg="Instance operation lock finished" action=create err="Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake" instance=mycontainer-snap0 project=default reusable=false
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=root instance=mycontainer-snap0 instanceType=container project=default type=disk
time="2023-07-19T11:40:55Z" level=debug msg="Removing device" device=eth0 instance=mycontainer-snap0 instanceType=container project=default type=nic
time="2023-07-19T11:40:55Z" level=debug msg="Matched trusted cert" fingerprint=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160 subject="CN=sysadmin@server1,O=linuxcontainers.org"
time="2023-07-19T11:40:55Z" level=debug msg="Handling API request" ip="192.168.3.2:52082" method=GET protocol=tls url=/1.0/operations/a2ba0adf-d5fe-4278-b89d-d815f1832f60 username=dd6e84c9594891e2989fccbc96250f1e4ccb11339d218bc8cb7a1b662220d160
time="2023-07-19T11:40:55Z" level=debug msg="Failure for operation" class=task description="Creating instance" err="Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake" operation=a2ba0adf-d5fe-4278-b89d-d815f1832f60 project=default
time="2023-07-19T11:40:55Z" level=info msg="ID: a2ba0adf-d5fe-4278-b89d-d815f1832f60, Class: task, Description: Creating instance" CreatedAt="2023-07-19 11:40:55.97518561 +0000 UTC" Err="Error transferring instance data: Failed waiting for migration control connection on target: websocket: bad handshake" Location=none MayCancel=false Metadata="map[]" Resources="map[containers:[/1.0/instances/mycontainer-snap0] instances:[/1.0/instances/mycontainer-snap0]]" Status=Failure StatusCode=Failure UpdatedAt="2023-07-19 11:40:55.97518561 +0000 UTC"
time="2023-07-19T11:40:55Z" level=debug msg="Event listener server handler stopped" listener=540123f7-76d3-436e-ae39-c649066c563f local="192.168.3.3:8443" remote="192.168.3.2:40576"
tomponline commented 11 months ago

@gabrielmougard want to have a go with this one?

gabrielmougard commented 11 months ago

@gerba3 @tomponline I'm trying to reproduce this scenario:

Here are my two servers (server1 and server2, each one configured with LXD)

+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
|  NAME   |  STATE  |          IPV4           |                      IPV6                       |      TYPE       | SNAPSHOTS |
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| server1 | RUNNING | 10.156.3.2 (enp7s0)     | fd42:dd83:4885:8d24:216:3eff:fe0b:e98d (enp5s0) | VIRTUAL-MACHINE | 0         |
|         |         | 10.156.2.2 (enp6s0)     |                                                 |                 |           |
|         |         | 10.156.162.200 (lxdbr0) |                                                 |                 |           |
|         |         | 10.156.162.100 (enp5s0) |                                                 |                 |           |
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+
| server2 | RUNNING | 10.156.3.3 (enp7s0)     | fd42:dd83:4885:8d24:216:3eff:fe2c:2a1d (enp5s0) | VIRTUAL-MACHINE | 0         |
|         |         | 10.156.1.2 (enp6s0)     |                                                 |                 |           |
|         |         | 10.156.162.200 (lxdbr0) |                                                 |                 |           |
|         |         | 10.156.162.101 (enp5s0) |                                                 |                 |           |
+---------+---------+-------------------------+-------------------------------------------------+-----------------+-----------+

and here are my remotes on server1 and server2 respectively:

output of `lxc remote list` on `server1`:

+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
|         NAME         |                        URL                        |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC | GLOBAL |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| 10.156.3.3           | https://10.156.3.3:8443                           | lxd           | tls         | NO     | NO     | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| images               | https://images.linuxcontainers.org                | simplestreams | none        | YES    | NO     | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| local (current)      | unix://                                           | lxd           | file access | NO     | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| server2              | https://server2:8443                              | lxd           | tls         | NO     | NO     | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu               | https://cloud-images.ubuntu.com/releases          | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-daily         | https://cloud-images.ubuntu.com/daily             | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal       | https://cloud-images.ubuntu.com/minimal/releases/ | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal-daily | https://cloud-images.ubuntu.com/minimal/daily/    | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+

output of `ip r` on `server1`:

default via 10.156.162.1 dev enp5s0 proto static 
10.156.162.0/24 dev lxdbr0 proto kernel scope link src 10.156.162.200 
10.156.162.0/24 dev enp5s0 proto kernel scope link src 10.156.162.100 
10.156.2.0/24 dev enp6s0 proto kernel scope link src 10.156.2.2 
10.156.3.0/24 dev enp7s0 proto kernel scope link src 10.156.3.2

and

output of `lxc remote list` on `server2`:

+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
|         NAME         |                        URL                        |   PROTOCOL    |  AUTH TYPE  | PUBLIC | STATIC | GLOBAL |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| 10.156.3.2           | https://10.156.3.2:8443                           | lxd           | tls         | NO     | NO     | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| images               | https://images.linuxcontainers.org                | simplestreams | none        | YES    | NO     | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| local (current)      | unix://                                           | lxd           | file access | NO     | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| server1              | https://server1:8443                              | lxd           | tls         | NO     | NO     | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu               | https://cloud-images.ubuntu.com/releases          | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-daily         | https://cloud-images.ubuntu.com/daily             | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal       | https://cloud-images.ubuntu.com/minimal/releases/ | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+
| ubuntu-minimal-daily | https://cloud-images.ubuntu.com/minimal/daily/    | simplestreams | none        | YES    | YES    | NO     |
+----------------------+---------------------------------------------------+---------------+-------------+--------+--------+--------+

output of `ip r` on `server2`:

default via 10.156.162.1 dev enp5s0 proto static 
10.156.162.0/24 dev lxdbr0 proto kernel scope link src 10.156.162.200
10.156.162.0/24 dev enp5s0 proto kernel scope link src 10.156.162.101 
10.156.1.0/24 dev enp6s0 proto kernel scope link src 10.156.1.2 
10.156.3.0/24 dev enp7s0 proto kernel scope link src 10.156.3.3 

Now, from the look of it, I think this should be similar to what you have (please tell me if I made a mistake in reproducing this environment).

Then, when attempting to create a container on server1 and send its snapshot to server2 like:

lxc launch images:ubuntu/jammy mycontainer
lxc copy mycontainer/snap0 server2:mycontainer-snap0

it works fine on my side (so far I didn't try the lxc storage volume copy server1:tank1/myvol tank1/myvol command) but for now my container uses the default pool. So I'll try with a custom one and see what I have.


Here is my LXD info:

config:
  core.https_address: :8443
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- macaroon_authentication
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- candid_authentication
- backup_compression
- candid_config
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- candid_config_key
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- rbac
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
  addresses:
  - 10.156.162.100:8443
  - '[fd42:dd83:4885:8d24:216:3eff:fe0b:e98d]:8443'
  - 10.156.2.2:8443
  - 10.156.3.2:8443
  - 10.156.162.200:8443
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB5zCCAWygAwIBAgIQRFut3WfbsqG3azSY0EFTFTAKBggqhkjOPQQDAzAlMQww
    CgYDVQQKEwNMWEQxFTATBgNVBAMMDHJvb3RAc2VydmVyMTAeFw0yMzEwMTgxNDMw
    MzZaFw0zMzEwMTUxNDMwMzZaMCUxDDAKBgNVBAoTA0xYRDEVMBMGA1UEAwwMcm9v
    dEBzZXJ2ZXIxMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEdTqbfSQ9v0QCKCQjKHtM
    fMC4V8Th+kblPbLexKVt3g0meRzUJHU2E8mDeFUF7VeUs1DNZUWtKFfsoSU867vX
    fLSr2P6hWetPEe7r12t/71E6XFnDBr/roW7xHv3h0t/Po2EwXzAOBgNVHQ8BAf8E
    BAMCBaAwEwYDVR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAqBgNVHREE
    IzAhggdzZXJ2ZXIxhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMD
    A2kAMGYCMQDiXIWMVYOQKkcQG/Eh4n9byQKl8Yp7CRB8OgHxpJOgxxWnCY9H8P8S
    c7bTv+yipzYCMQCC0RumBgoACJHlNcVtD/K2kDrumIGy499dEdyCFLKo4RDYQqd+
    Wz3rh4/+sxHfeGc=
    -----END CERTIFICATE-----
  certificate_fingerprint: f9f13a357fd976d72f99849a3b6bc52f2a6d09697c70830ca6fb8df805ef69d0
  driver: lxc | qemu
  driver_version: 5.0.3 | 8.0.4
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    shiftfs: "false"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 5.15.0-86-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "22.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: server1
  server_pid: 2656
  server_version: "5.18"
  storage: dir
  storage_version: "1"
  storage_supported_drivers:
  - name: btrfs
    version: 5.16.2
    remote: false
  - name: ceph
    version: 17.2.6
    remote: true
  - name: cephfs
    version: 17.2.6
    remote: true
  - name: cephobject
    version: 17.2.6
    remote: true
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.45.0
    remote: false
  - name: zfs
    version: 2.1.5-1ubuntu6~22.04.1
    remote: false
gerba3 commented 11 months ago

Hi, thanks for the response and the attempt to reproduce.

I tested it myself again using the setup from then and then upgraded to the latest release using snap refresh. The issue still occurs.

One difference is that I am using containers and not VMs - I don't believe this to be significant in this instance. I think the launch and copy procedure is missing the snapshot command but once again, not significant. Further differences I found are the storage driver used (zfs vs. dir) and that you set it up using root, whereas I did it using the primary user (UID 1000). This should not matter.

I do not use any IPv6 configuration on my system. Does server2 resolve to the IPv6 address in your setup? I think it is possible that this avoids the issue entirely.

I thought that maybe selecting the remote in the copy command to be the IPv4 address might help but I figured that the underlying issue is with how the copy command handles the process. When I run the copy using the following command I get valuable debug logs:

lxc copy -v --debug mycontainer/snap0 server2:mycontainer-snap0

Output: debug_log_copy_command.txt

My hypothesis is that (as seen in the debug log above) the command gets the information of both instances (especially the addresses part) and then attempts to use the first network address that is listed on server1 for the transfer (/sends this address with the request to server2 that it should use that address for the transfer). This would explain why the transfer works in your setup: It bypasses the two local links entirely.

Snippet from the Output above:

DEBUG  [2023-10-19T07:57:23Z] Sending request to LXD                        etag= method=POST url="https://server2:8443/1.0/instances"
DEBUG  [2023-10-19T07:57:23Z]
    {
        "architecture": "x86_64",
        "config": {
            "image.architecture": "amd64",
            "image.description": "ubuntu 22.04 LTS amd64 (release) (20230719)",
            "image.label": "release",
            "image.os": "ubuntu",
            "image.release": "jammy",
            "image.serial": "20230719",
            "image.type": "squashfs",
            "image.version": "22.04",
            "volatile.base_image": "a0a9b9976255e7235afe495e920e6c0f40f55ae22852a5d5c31139aa9408f2e5"
        },
        "devices": {},
        "ephemeral": false,
        "profiles": [
            "default"
        ],
        "stateful": false,
        "description": "",
        "name": "mycontainer-snap0",
        "source": {
            "type": "migration",
            "certificate": "-----BEGIN CERTIFICATE-----\nMIICBjCCAY2gAwIBAgIRAKeMoAWfAt+1hBN4RjqxIN0wCgYIKoZIzj0EAwMwNTEc\nMBoGA1UEChMTbGludXhjb250YWluZXJzLm9yZzEVMBMGA1UEAwwMcm9vdEBzZXJ2\nZXIxMB4XDTIzMDcxOTA5MTcwMVoXDTMzMDcxNjA5MTcwMVowNTEcMBoGA1UEChMT\nbGludXhjb250YWluZXJzLm9yZzEVMBMGA1UEAwwMcm9vdEBzZXJ2ZXIxMHYwEAYH\nKoZIzj0CAQYFK4EEACIDYgAEUSi2P7EzLy6dRRm1DfTOy948C/fR83FfzYJ5PeCd\ne4bgkSrc9/agW7au8x6IBI/vCGKvYtYULjpDMBntgtu8v8m5/ShiPiiHPBy0NOLi\nN+7mjKV3XV9f/4k/r1cnDxlho2EwXzAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAww\nCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAqBgNVHREEIzAhggdzZXJ2ZXIxhwR/\nAAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2cAMGQCMEwbQozhiX95\n5WRsyHsjHczwj88zrpGfgQSYn8EnPE7xuJUSFed7jHXfcHU4qaOgRQIwOCxXX/ir\nbhKCigbhhuHUJRE9dXH4DjcX9xiFz410CFWRL0suJu6Mwh4Kd95E6cdy\n-----END CERTIFICATE-----\n",
            "base-image": "a0a9b9976255e7235afe495e920e6c0f40f55ae22852a5d5c31139aa9408f2e5",
            "mode": "pull",
            "operation": "https://192.168.2.2:8443/1.0/operations/4de3d79f-b8f2-4f26-95e1-c6d7fad014c0",
            "secrets": {
                "control": "b488251fab9d62e6d8af2a1b7544d7d685edc393af879a322a2bef20cf1cd4d9",
                "fs": "917b6e940e6ea824ec9150c231efa4c8a3ef17a0b8777131fc7c604fb792e0fe"
            },
            "allow_inconsistent": false
        },
        "instance_type": "",
        "type": "container"
    }

I am not sure from the logs if this is the payload of the POST request or the response.

gerba3 commented 11 months ago

I switched my network configuration to this and the transfer works now. The list in the lxc info output uses the same order in which the interfaces are listed with ip addr based on the device number.

# server1
$ cat /etc/netplan/00-installer-config.yaml
network:
  ethernets:
    enp1s0:
      addresses:
      - 192.168.3.2/24
      nameservers:
        addresses: []
        search: []
    enp7s0:
      addresses:
      - 192.168.2.2/24
      nameservers:
        addresses: []
        search: []
    enp8s0:
      dhcp4: false
      addresses:
      - 192.168.122.100/24
      gateway4: 192.168.122.1
      nameservers:
        addresses: [1.1.1.1]
        search: []
  version: 2

# server2
$ cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    enp1s0:
      addresses:
      - 192.168.3.3/24
      nameservers:
        addresses: []
        search: []
    enp7s0:
      addresses:
      - 192.168.1.2/24
      nameservers:
        addresses: []
        search: []
    enp8s0:
      dhcp4: false
      addresses:
      - 192.168.122.101/24
      gateway4: 192.168.122.1
      nameservers:
        addresses: [1.1.1.1]
        search: []
  version: 2

The output of the lxc info command on server1 for the addresses:

environment:
  addresses:
  - 192.168.3.2:8443
  - 192.168.2.2:8443
  - 192.168.122.100:8443
  - 192.168.122.200:8443
gabrielmougard commented 11 months ago

@gerba3 thanks for the quick response. Regarding IPv6, I removed my allocated IPv6 address and it does not change the result. This ordering issue is interesting. I'll look into that.

gabrielmougard commented 11 months ago

@gerba3 @tomponline I think I might have an idea on what's going on.. When attempting to copy (in pull mode), the function func (*ProtocolLXD) tryCreateInstance(req api.InstancesPost, urls []string, op Operation) (RemoteOperation, error) is called. Then, we'll iterate over the different addresses registered as LXD server : in your case it'll be [192.168.2.2:8443, 192.168.3.2:8443]. The order matters here and I'll try to explain why. We'll first try to create the instance based on the source LXD server at 192.168.2.2:8443. This one does not exist, but the func (*ProtocolLXD).CreateInstance(instance api.InstancesPost) (Operation, error) will not fail yet, but its operation will. Why is that ?

Well, in the func createFromMigration(s *state.State, r *http.Request, projectName string, profiles []api.Profile, req *api.InstancesPost) response.Response function, when the migrationSink is created, the underlying websocket dialer is not called but if it was, then it'll return an error before the operation starts and the tryCreateInstance function will call CreateInstance on the next address and be successful. Therefore, the operation fails because it cannot reach the non-existant LXD server as expected. And because a failing operation breaks the tryCreateInstance loop and does not continue (there must be a good reason for it. @tomponline do you know why ?), I suggest we add the following in the createFromMigration function:

sink, err := newMigrationSink(&migrationArgs)
if err != nil {
    return response.InternalError(err)
}

// NEW: Check that the source server is reachable before starting the operation, so that the client is able to try the next address.
_, _, err = dialer.DialContext(r.Context(), req.Source.Operation, http.Header{})
if err != nil {
    return response.InternalError(err)
}

in order to return early and avoid this issue. What do you think ?

tomponline commented 11 months ago

@gabrielmougard as discussed please let me know if you find the issue is because the source is timing out waiting for the target to connect to it as it iterates the various IPs offered to it from the source via the client. Thanks