canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.27k stars 910 forks source link

VM not getting IPV4 using LXD 5.18 - Arm processor Rockchip RK3588 #12354

Closed ghqter30 closed 3 months ago

ghqter30 commented 9 months ago

Issue description

LXD deployed on SBC / Rockchip RK3588, Virtual Machines don't get an IPV4 and return "Error: LXD VM agent isn't currently running" when attempted to run exec command. CPU process is constantly maxed out. Best known/tested version where this behavior is not present is version 5.13 --revision=24850

Required information

Steps to reproduce

  1. Step one sudo snap install lxd --channel latest/edge # git-a733f9b 2023-10-06 (25934) sudo lxd init

  2. Step two sudo lxc launch ubuntu:22.04 --vm vm1

  3. Step three sudo lxc list

    +------+---------+---------------------+-----------------------------------------------+-----------------+-----------+
    | NAME |  STATE  |        IPV4         |                     IPV6                      |      TYPE       | SNAPSHOTS |
    +------+---------+---------------------+-----------------------------------------------+-----------------+-----------+
    | c1   | RUNNING | 10.213.33.30 (eth0) | fd42:ee0b:c420:2042:216:3eff:fef2:f6e8 (eth0) | CONTAINER       | 0         |
    +------+---------+---------------------+-----------------------------------------------+-----------------+-----------+
    | vm1  | RUNNING |                     | fd42:ee0b:c420:2042:216:3eff:fea3:fcfe (eth0) | VIRTUAL-MACHINE | 0         |
    +------+---------+---------------------+-----------------------------------------------+-----------------+-----------+

    sudo lxc exec vm1 bash

    Error: LXD VM agent isn't currently running

sudo lxc stop vm1 --force

sudo lxc start vm1 --console

To detach from the console, press: <ctrl>+a q

Information to attach

sudo lxc info vm1 --show-log

Status: RUNNING
Type: virtual-machine
Architecture: aarch64
PID: 6515
Created: 2023/10/08 09:18 PDT
Last Used: 2023/10/08 11:33 PDT

Resources:
  Processes: -1
  Network usage:
    eth0:
      Type: broadcast
      State: UP
      Host interface: tapf2305589
      MAC address: 00:16:3e:a3:fc:fe
      MTU: 1500
      Bytes received: 232B
      Bytes sent: 0B
      Packets received: 2
      Packets sent: 0
      IP addresses:
        inet6: fd42:ee0b:c420:2042:216:3eff:fea3:fcfe/64 (global)

Log:

sudo lxc config show vm1 --expanded

config:
  image.architecture: arm64
  image.description: ubuntu 22.04 LTS arm64 (release) (20230927)
  image.label: release
  image.os: ubuntu
  image.release: jammy
  image.serial: "20230927"
  image.type: disk1.img
  image.version: "22.04"
  volatile.base_image: af58751ca593b6c3e38836254a0565cbb86fd01c2fbb5d1ba2c43ea27cc02e78
  volatile.cloud-init.instance-id: 12ec2bf4-ad25-451f-8772-58768dd630d8
  volatile.eth0.host_name: tapf2305589
  volatile.eth0.hwaddr: 00:16:3e:a3:fc:fe
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: bb4f8b8d-f6f7-4681-84d3-b0b48f836083
  volatile.uuid.generation: bb4f8b8d-f6f7-4681-84d3-b0b48f836083
  volatile.vsock_id: "1537207436"
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""

sudo cat /var/snap/lxd/common/lxd/logs/lxd.log

time="2023-10-08T09:11:38-07:00" level=warning msg="AppArmor support has been disabled because of lack of kernel support"
time="2023-10-08T09:11:38-07:00" level=warning msg=" - AppArmor support has been disabled, Disabled because of lack of kernel support"
time="2023-10-08T09:11:38-07:00" level=warning msg=" - Couldn't find the CGroup network priority controller, per-instance network priority will be ignored. Please use per-device limits.priority instead"
time="2023-10-08T09:11:41-07:00" level=warning msg="Failed to initialize fanotify, falling back on inotify" err="Failed to watch directory \"/dev\": no such device"
time="2023-10-08T09:18:46-07:00" level=warning msg="Unable to use virtio-fs for config drive, using 9p as a fallback" err="Architecture unsupported" instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T09:18:46-07:00" level=warning msg="Using writeback cache I/O" devPath=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1/root.img device=root fsType=btrfs instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T09:19:34-07:00" level=warning msg="Unable to use virtio-fs for config drive, using 9p as a fallback" err="Architecture unsupported" instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T09:19:34-07:00" level=warning msg="Using writeback cache I/O" devPath=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1/root.img device=root fsType=btrfs instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T11:10:11-07:00" level=warning msg="Unable to use virtio-fs for config drive, using 9p as a fallback" err="Architecture unsupported" instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T11:10:11-07:00" level=warning msg="Using writeback cache I/O" devPath=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1/root.img device=root fsType=btrfs instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T11:33:11-07:00" level=warning msg="Unable to use virtio-fs for config drive, using 9p as a fallback" err="Architecture unsupported" instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T11:33:11-07:00" level=warning msg="Using writeback cache I/O" devPath=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1/root.img device=root fsType=btrfs instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T11:33:42-07:00" level=warning msg="Unable to use virtio-fs for config drive, using 9p as a fallback" err="Architecture unsupported" instance=vm1 instanceType=virtual-machine project=default
time="2023-10-08T11:33:42-07:00" level=warning msg="Using writeback cache I/O" devPath=/var/snap/lxd/common/lxd/storage-pools/default/virtual-machines/vm1/root.img device=root fsType=btrfs instance=vm1 instanceType=virtual-machine project=default
tomponline commented 9 months ago

Are you able to try on 5.0/edge channel (you will need a fresh install as can't downgrade from latest/stable).

This channel comes with same qemu version but previous edk firmware version (as we've seen a fair few problems with more recent edk firmware versions).

ghqter30 commented 9 months ago

5.0/edge channel works just fine:

sudo snap list lxd

lxd   git-7f8a581  25932  5.0/edge  canonical✓  

sudo lxc list

+------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| NAME |  STATE  |          IPV4          |                      IPV6                       |      TYPE       | SNAPSHOTS |
+------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| c2   | RUNNING | 10.33.200.84 (eth0)    | fd42:e225:ffb0:2967:216:3eff:fe13:5045 (eth0)   | CONTAINER       | 0         |
+------+---------+------------------------+-------------------------------------------------+-----------------+-----------+
| vm2  | RUNNING | 10.33.200.116 (enp5s0) | fd42:e225:ffb0:2967:216:3eff:fe94:e2ea (enp5s0) | VIRTUAL-MACHINE | 0         |
+------+---------+------------------------+-------------------------------------------------+-----------------+-----------

sudo lxc exec vm2 -- uptime

 22:33:07 up 3 min,  0 users,  load average: 0.07, 0.17, 0.08
mihalicyn commented 9 months ago

Hi @ghqter30

Can you try to run the VM on latest/edge with lxc launch ubuntu:22.04 --vm --console=vga vm1 and try to catch on which step it fails? Can you see any messaged from GRUB? Or maybe from the Linux kernel.

We have seen and workarounded two issues with edk2: https://github.com/canonical/lxd-pkg-snap/pull/147 https://github.com/canonical/lxd-pkg-snap/pull/153

Unfortunately, edk2 developers do not care about maintaining compatibility with old versions of GRUB/shim/Linux kernel and as a consequence we have problems like that.

Another experiment that you can make is to try lxc launch ubuntu:22.04 -c security.secureboot=false --vm --console=vga vm1 and check results.

Kind regards, Alex

tomponline commented 9 months ago

thanks @mihalicyn !

ghqter30 commented 9 months ago

I get this:

sudo snap list lxd

lxd   git-7d7624d  25952  latest/edge  canonical✓ 

sudo lxc launch ubuntu:22.04 -c security.secureboot=false --vm --console=vga vm1

Creating vm1
Starting vm1

(remote-viewer:7732): Gtk-WARNING **: 08:42:00.885: cannot open display:
sudo lxc list vm1
+------+---------+------+---------------------------------------------+-----------------+-----------+
| NAME |  STATE  | IPV4 |                    IPV6                     |      TYPE       | SNAPSHOTS |
+------+---------+------+---------------------------------------------+-----------------+-----------+
| vm1  | RUNNING |      | fd42:6c59:447:38fb:216:3eff:fe3b:c49 (eth0) | VIRTUAL-MACHINE | 0         |
+------+---------+------+---------------------------------------------+-----------------+-----------+
mihalicyn commented 9 months ago

Ah, you are likely running this on the server without desktop environment? Then just use --console without vga.

ghqter30 commented 9 months ago

Console output doesn't show much:

sudo lxc launch ubuntu:22.04 -c security.secureboot=false --vm --console vm2

Creating vm2
Starting vm2
To detach from the console, press: <ctrl>+a q
ghqter30 commented 9 months ago

Graphic and Text Console show the same empty results:

LXD_UI_Graphic_console

LXD_UI_Text_console

ghqter30 commented 8 months ago

Any suggestions?

mihalicyn commented 3 months ago

Hi @ghqter30

Sorry about this huge delay with reply. This thread got lost. (Feel free to ping us next time!)

The first thing that I would start with is to try latest/edge channel again.

Results that you've shared with us a bit interesting: nothing in the console output, nothing on the screen and VM just stuck, right? Couldn't you also show:

# replace "vm1" to your VM instance name
cat /proc/$(cat /var/snap/lxd/common/lxd/logs/vm1/qemu.pid)/stack
tomponline commented 3 months ago

Closing this for now, but please let us know if you are still experiencing the issue and can reproduce on latest/edge channel.

Thanks!