canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 931 forks source link

Return migration errors to client correctly #11948

Closed pcgeek86 closed 1 month ago

pcgeek86 commented 1 year ago

Required information

Issue description

When I try to move an LXD virtual machine from one cluster member, to a different cluster member, I receive an error.

Error: Instance move to destination failed: Error transferring instance data: Failed migration on target: Failed getting migration target filesystem connection: websocket: bad handshake
+-----------------------------------------------------------------------------------------------------------------------+
|  +---------------------------+     +-------------------------+     +--------------------------+                       |
|  |lxd01                      |     |lxd03                    |     |lxd04                     |                       |
|  |LXD host                   |     |LXD host                 |     |LXD host                  |                       |
|  |Cluster member             |     |Cluster member           |     |Cluster member            |                       |
|  |                           |     |                         |     |                          |                       |
|  |                           |     |                         |     |                          |                       |
|  | +-----------------------+ |     |                         |     |                          |                       |
|  | |vm01                   | |     |                         |     |                          |                       |
|  | |Alpine Linux           | |     +-------------------------+     |                          |                       |
|  | |Default profile        | |                                     |                          |                       |
|  | +-----------------------+ |                                     |                          |                       |
|  |                   |       |                                     |                          |                       |
|  |                   |       |                                     |                          |                       |
|  |                   +---------------------------------------------|                          |                       |
|  |                           |  Move vm01 from lxd01 to lxd04      |                          |                       |
|  |                           |                                     |                          |                       |
|  |                           |  lxc move vm01 --target lxd04       |                          |                       |
|  |                           |                                     |                          |                       |
|  |                           |                                     |                          |                       |
|  +---------------------------+                                     +--------------------------+                       |
|                                                                                                                       |
|                                                                                                                       |
|                                                                                                                       |
|                                                                                                                       |
|LXD cluster (3 members)                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------+

Steps to reproduce

  1. Create a three-node LXD cluster
  2. Deploy a bunch of VMs across the cluster
  3. Try to move a VM between two different cluster members

Information to attach

roosterfish commented 1 year ago

Hi @pcgeek86, I was able to reproduce this. It looks like the error message isn't very clear, but the daemon clearly says you need to enable live migration for the instance:

root@c1:~# journalctl -u snap.lxd.daemon -f
...
Jul 06 07:02:49 c1 lxd.daemon[738]: time="2023-07-06T07:02:49Z" level=error msg="Failed migration on source" clusterMoveSourceName=vm01 err="Stateful migration requires migration.stateful to be set to true" instance=vm01 live=true project=default push=false
...

After setting:

I can run lxc mv vm01 --target c3 and the VM gets migrated:

root@c1:~# lxc ls
+------+---------+----------------------+------+-----------------+-----------+----------+
| NAME |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS | LOCATION |
+------+---------+----------------------+------+-----------------+-----------+----------+
| c01  | STOPPED |                      |      | CONTAINER       | 0         | c2       |
+------+---------+----------------------+------+-----------------+-----------+----------+
| vm01 | RUNNING | 240.109.0.194 (eth0) |      | VIRTUAL-MACHINE | 0         | c1       |
+------+---------+----------------------+------+-----------------+-----------+----------+
root@c1:~# lxc mv vm01 --target c3
root@c1:~# lxc ls
+------+---------+----------------------+------+-----------------+-----------+----------+
| NAME |  STATE  |         IPV4         | IPV6 |      TYPE       | SNAPSHOTS | LOCATION |
+------+---------+----------------------+------+-----------------+-----------+----------+
| c01  | STOPPED |                      |      | CONTAINER       | 0         | c2       |
+------+---------+----------------------+------+-----------------+-----------+----------+
| vm01 | RUNNING | 240.109.0.194 (eth0) |      | VIRTUAL-MACHINE | 0         | c3       |
+------+---------+----------------------+------+-----------------+-----------+----------+
roosterfish commented 1 year ago

Do you still see the error after enabling live migration? There are some more details here: https://documentation.ubuntu.com/lxd/en/latest/howto/move_instances/#live-migration

tomponline commented 1 year ago

Yes that is the issue here I think. There is a long-standing general problem with the migration protocol in that there isn't a good way to ensure errors from the target are returned to the source and vice versa when something goes wrong on the other side. Its something I spent some time trying to sort out when adding live migration support, but was not entirely successful. Its partly due to the way the client connects to source/dest depending on push/pull/relay and which one it uses the first error from.

tomponline commented 1 year ago

I've renamed this issue so we can address the error reporting issue, and use the OP's example as an easy way to reproduce the misleading errors.

Qubitium commented 4 months ago

Hit this bug. Comment at https://github.com/canonical/lxd/issues/12150#issuecomment-2167720355

tomponline commented 3 months ago

@boltmark please could you look at this one and see if https://github.com/lxc/incus/pull/412 could help here.

Thanks