pcgeek86 commented 1 year ago

Required information

Distribution: Ubuntu
Distribution version: 23.04 Lunar Lobster
The output of "lxc info" or if that fails:
- Kernel version: 6.2.0-24-generic
- LXC version: 5.15
- LXD version: 5.15
- Storage backend in use: zfs

Issue description

When I try to move an LXD virtual machine from one cluster member, to a different cluster member, I receive an error.

Error: Instance move to destination failed: Error transferring instance data: Failed migration on target: Failed getting migration target filesystem connection: websocket: bad handshake

+-----------------------------------------------------------------------------------------------------------------------+
|  +---------------------------+     +-------------------------+     +--------------------------+                       |
|  |lxd01                      |     |lxd03                    |     |lxd04                     |                       |
|  |LXD host                   |     |LXD host                 |     |LXD host                  |                       |
|  |Cluster member             |     |Cluster member           |     |Cluster member            |                       |
|  |                           |     |                         |     |                          |                       |
|  |                           |     |                         |     |                          |                       |
|  | +-----------------------+ |     |                         |     |                          |                       |
|  | |vm01                   | |     |                         |     |                          |                       |
|  | |Alpine Linux           | |     +-------------------------+     |                          |                       |
|  | |Default profile        | |                                     |                          |                       |
|  | +-----------------------+ |                                     |                          |                       |
|  |                   |       |                                     |                          |                       |
|  |                   |       |                                     |                          |                       |
|  |                   +---------------------------------------------|                          |                       |
|  |                           |  Move vm01 from lxd01 to lxd04      |                          |                       |
|  |                           |                                     |                          |                       |
|  |                           |  lxc move vm01 --target lxd04       |                          |                       |
|  |                           |                                     |                          |                       |
|  |                           |                                     |                          |                       |
|  +---------------------------+                                     +--------------------------+                       |
|                                                                                                                       |
|                                                                                                                       |
|                                                                                                                       |
|                                                                                                                       |
|LXD cluster (3 members)                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------+

Steps to reproduce

Create a three-node LXD cluster
Deploy a bunch of VMs across the cluster
Try to move a VM between two different cluster members

Information to attach

[ ] Any relevant kernel output (dmesg)
[ ] Container log (lxc info NAME --show-log)
[ ] Container configuration (lxc config show NAME --expanded)
[ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
[ ] Output of the client with --debug
[ ] Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)

roosterfish commented 1 year ago

Hi @pcgeek86, I was able to reproduce this. It looks like the error message isn't very clear, but the daemon clearly says you need to enable live migration for the instance:

root@c1:~# journalctl -u snap.lxd.daemon -f
...
Jul 06 07:02:49 c1 lxd.daemon[738]: time="2023-07-06T07:02:49Z" level=error msg="Failed migration on source" clusterMoveSourceName=vm01 err="Stateful migration requires migration.stateful to be set to true" instance=vm01 live=true project=default push=false
...

After setting:

lxc config set vm01 migration.stateful=true limits.memory: 512MiB
lxc config device override vm01 root size.state=512MiB

I can run lxc mv vm01 --target c3 and the VM gets migrated:

root@c1:~# lxc ls
+------+---------+----------------------+------+-----------------+-----------+----------+
| NAME |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS | LOCATION |
+------+---------+----------------------+------+-----------------+-----------+----------+
| c01  | STOPPED |                      |      | CONTAINER       | 0         | c2       |
+------+---------+----------------------+------+-----------------+-----------+----------+
| vm01 | RUNNING | 240.109.0.194 (eth0) |      | VIRTUAL-MACHINE | 0         | c1       |
+------+---------+----------------------+------+-----------------+-----------+----------+
root@c1:~# lxc mv vm01 --target c3
root@c1:~# lxc ls
+------+---------+----------------------+------+-----------------+-----------+----------+
| NAME |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS | LOCATION |
+------+---------+----------------------+------+-----------------+-----------+----------+
| c01  | STOPPED |                      |      | CONTAINER       | 0         | c2       |
+------+---------+----------------------+------+-----------------+-----------+----------+
| vm01 | RUNNING | 240.109.0.194 (eth0) |      | VIRTUAL-MACHINE | 0         | c3       |
+------+---------+----------------------+------+-----------------+-----------+----------+

roosterfish commented 1 year ago

Do you still see the error after enabling live migration? There are some more details here: https://documentation.ubuntu.com/lxd/en/latest/howto/move_instances/#live-migration

tomponline commented 1 year ago

Yes that is the issue here I think. There is a long-standing general problem with the migration protocol in that there isn't a good way to ensure errors from the target are returned to the source and vice versa when something goes wrong on the other side. Its something I spent some time trying to sort out when adding live migration support, but was not entirely successful. Its partly due to the way the client connects to source/dest depending on push/pull/relay and which one it uses the first error from.

tomponline commented 1 year ago

I've renamed this issue so we can address the error reporting issue, and use the OP's example as an easy way to reproduce the misleading errors.

Qubitium commented 4 months ago

Hit this bug. Comment at https://github.com/canonical/lxd/issues/12150#issuecomment-2167720355

tomponline commented 3 months ago

@boltmark please could you look at this one and see if https://github.com/lxc/incus/pull/412 could help here.

Thanks

canonical / lxd

Return migration errors to client correctly #11948

Required information

Issue description

Steps to reproduce

Information to attach