canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

lxd cannot launch (useable) oracular container under focal's kernel #13844

Closed waveform80 closed 4 weeks ago

waveform80 commented 3 months ago

Required information

Issue description

(Copied from LP: #2075176)

We've been encountering an issue with building the Ubuntu for Raspberry Pi images for oracular for the past couple of weeks:

https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/oracular/ubuntu-preinstalled

It turns out the root issue is that the builders are running focal (kernel version 5.4), the build occurs under lxd (in an oracular container), and (peculiarly to this particular build, which uses ubuntu-image) the build requires snapd to be operational within the oracular container.

Steps to reproduce

As this occurred with our Raspberry Pi images under arm64, I'm replicating on a Pi here too, but I'd be interested to know if this also fails on PC archs. I'm using noble on the hardware itself, and demonstrating the issue within a focal VM:

user@host:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04 LTS
Release:        24.04
Codename:       noble
user@host:~$ lxc launch ubuntu:f focalvm
user@host:~$ lxc exec focalvm bash
root@focalvm:~# lxd init
# Run through the usual lxd configuration
root@focalvm:~# lxc launch ubuntu-daily:o oractest -c security.nesting=true
root@focalvm:~# lxc exec oractest bash
root@oractest:~# snap list
... cannot communicate with server: Get "http://localhost/v2/snaps": dial unix /run/snapd.socket: connect: connection refused

The test (and indeed the image build, ultimately) work happily if performed under jammy's kernel (5.15). Also worth noting I am using the security.nesting workaround, and that I've tried lxc 4.0, 5.21, and 6.1 within the focal VM but all fail.

Information to attach

tomponline commented 3 months ago

Most likely related to https://github.com/canonical/lxd/issues/13810, we are still in the process of applying the necessary fixes to the LXD snap in order to get sufficient apparmor support. You can try latest/edge snap channel right now which has it.

However im not sure if the Focal kernel has the necessary support either, so will have to try it.

tomponline commented 3 months ago

Also worth noting I am using the security.nesting workaround, and that I've tried lxc 4.0, 5.21, and 6.1 within the focal VM but all fail.

So that would suggest it wont be possible to get Oracular starting using the Focal kernel using the existing fixes as it seems to lack the sufficient support. This sounds like an unrelated issue.

tomponline commented 3 months ago

But do please try latest/edge if you can and let us know if works.

waveform80 commented 3 months ago

Tried with latest/edge within the focal VM, but it's the same story I'm afraid (can post logs and info if requested, but it's all the same symptoms so I imagine they're all pretty similar to the stuff already posted)

tomponline commented 3 months ago

Please could you see if there are any DENIED errors in journalctl on the host to see if apparmor is blocking something (using latest/edge), then we'll know if its the original issue and lack of kernel support means we cant apply the updated apparmor profile.

waveform80 commented 3 months ago

Sure, I've attached the dmesg output with latest/edge lxd under the 5.4 focal VM (adding a ---- START HERE ---- marker indicating where I started the container). Unfortunately it doesn't look terribly interesting to me -- nothing denied by apparmor, though there is one note about disabling a cgroup2 socket.

lxc-dmesg.log

tomponline commented 3 months ago

@waveform80 ah so no apparmor denials, I wonder if this is due to cgroupv1 in Focal, whereas systemd in Oracular probably requires cgroupv2.

tomponline commented 3 months ago

I did some research into this and it works with Focal's HWE kernel and enabling cgroupv2 using:

systemd.unified_cgroup_hierarchy=1

So that confirms its related to lack of cgroupv1 support in systemd (as Oracular comes with systemd v256).

These look relevant:

tomponline commented 3 months ago

I tried lxc config set c1 raw.lxc="lxc.init.cmd = /sbin/init SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1" and that didn't work either, although I can see systemd was started with that argument:

root@c1:~# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  1.3  21192 13204 ?        Ss   09:32   0:00 /sbin/init SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1
root          81  0.0  0.3   9820  3796 pts/1    Ss   09:32   0:00 su -l
root          82  0.0  0.5   9708  5868 pts/1    S    09:32   0:00 -bash
root          94  0.0  0.3   8780  3864 pts/1    R+   09:32   0:00 ps aux
tomponline commented 3 months ago

So from what we know so far we need:

This works on a Focal host.

mihalicyn commented 4 weeks ago

I can confirm that it doesn't work on Focal kernel at all.

root@o:~# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.1  1.4  22028 14152 ?        Ss   11:29   0:01 /sbin/init SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1 systemd.unified_cgroup_hierarchy=0 systemd.log_level=debug
root          78  0.0  0.3   9944  3916 pts/1    Ss   11:52   0:00 su -l
root          79  0.0  0.6  10064  6392 pts/1    S    11:52   0:00 -bash
root         434  0.0  0.3   8856  3864 pts/1    R+   12:09   0:00 ps aux
root@o:~# uname -a
Linux o 5.4.0-1119-kvm #127-Ubuntu SMP Fri Aug 9 09:40:59 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

I tried to debug this with strace but there are too many errors here and there and it's quite hard to distinguish what is important and what is not. I would say that there is no perspective to make it work as it requires kernel modification or systemd modification.

Nothing that we can do from the LXD side.

I noticed that systemd actively uses xattrs on cgroupfs but they are no supported in Focal kernel, I don't know if this is a blocker for systemd to start or not.

So if it's something we really want to get fixed I'm ready to allocate like a week for that and dive into systemd internals to figure out what's specifically missing in the kernel, but it's not something that can be fixed on the LXD side anyways.

I can confirm that apt install linux-image-generic-hwe-20.04 helps to make it work. So, kernel 5.15 is enough to make it work.

tomponline commented 4 weeks ago

Thanks for digging into it.

Unless there is a specific request for it I think we can stick with the requirements described here:

https://github.com/canonical/lxd/issues/13844#issuecomment-2268632337