Closed waveform80 closed 4 weeks ago
Most likely related to https://github.com/canonical/lxd/issues/13810, we are still in the process of applying the necessary fixes to the LXD snap in order to get sufficient apparmor support. You can try latest/edge snap channel right now which has it.
However im not sure if the Focal kernel has the necessary support either, so will have to try it.
Also worth noting I am using the
security.nesting
workaround, and that I've tried lxc 4.0, 5.21, and 6.1 within the focal VM but all fail.
So that would suggest it wont be possible to get Oracular starting using the Focal kernel using the existing fixes as it seems to lack the sufficient support. This sounds like an unrelated issue.
But do please try latest/edge if you can and let us know if works.
Tried with latest/edge within the focal VM, but it's the same story I'm afraid (can post logs and info if requested, but it's all the same symptoms so I imagine they're all pretty similar to the stuff already posted)
Please could you see if there are any DENIED errors in journalctl on the host to see if apparmor is blocking something (using latest/edge
), then we'll know if its the original issue and lack of kernel support means we cant apply the updated apparmor profile.
Sure, I've attached the dmesg output with latest/edge lxd under the 5.4 focal VM (adding a ---- START HERE ----
marker indicating where I started the container). Unfortunately it doesn't look terribly interesting to me -- nothing denied by apparmor, though there is one note about disabling a cgroup2 socket.
@waveform80 ah so no apparmor denials, I wonder if this is due to cgroupv1 in Focal, whereas systemd in Oracular probably requires cgroupv2.
I did some research into this and it works with Focal's HWE kernel and enabling cgroupv2 using:
systemd.unified_cgroup_hierarchy=1
So that confirms its related to lack of cgroupv1 support in systemd (as Oracular comes with systemd v256).
These look relevant:
I tried lxc config set c1 raw.lxc="lxc.init.cmd = /sbin/init SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1"
and that didn't work either, although I can see systemd was started with that argument:
root@c1:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 1.3 21192 13204 ? Ss 09:32 0:00 /sbin/init SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1
root 81 0.0 0.3 9820 3796 pts/1 Ss 09:32 0:00 su -l
root 82 0.0 0.5 9708 5868 pts/1 S 09:32 0:00 -bash
root 94 0.0 0.3 8780 3864 pts/1 R+ 09:32 0:00 ps aux
So from what we know so far we need:
systemd.unified_cgroup_hierarchy=1
latest/stable
or 5.21/stable
channel.This works on a Focal host.
I can confirm that it doesn't work on Focal kernel at all.
root@o:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 1.4 22028 14152 ? Ss 11:29 0:01 /sbin/init SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1 systemd.unified_cgroup_hierarchy=0 systemd.log_level=debug
root 78 0.0 0.3 9944 3916 pts/1 Ss 11:52 0:00 su -l
root 79 0.0 0.6 10064 6392 pts/1 S 11:52 0:00 -bash
root 434 0.0 0.3 8856 3864 pts/1 R+ 12:09 0:00 ps aux
root@o:~# uname -a
Linux o 5.4.0-1119-kvm #127-Ubuntu SMP Fri Aug 9 09:40:59 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
I tried to debug this with strace but there are too many errors here and there and it's quite hard to distinguish what is important and what is not. I would say that there is no perspective to make it work as it requires kernel modification or systemd modification.
Nothing that we can do from the LXD side.
I noticed that systemd actively uses xattrs on cgroupfs but they are no supported in Focal kernel, I don't know if this is a blocker for systemd to start or not.
So if it's something we really want to get fixed I'm ready to allocate like a week for that and dive into systemd internals to figure out what's specifically missing in the kernel, but it's not something that can be fixed on the LXD side anyways.
I can confirm that apt install linux-image-generic-hwe-20.04
helps to make it work. So, kernel 5.15 is enough to make it work.
Thanks for digging into it.
Unless there is a specific request for it I think we can stick with the requirements described here:
https://github.com/canonical/lxd/issues/13844#issuecomment-2268632337
Required information
Distribution: Ubuntu
Distribution version: 20.04 (focal)
The output of "snap list --all lxd core20 core22 core24 snapd":
The output of "lxc info" or if that fails:
Issue description
(Copied from LP: #2075176)
We've been encountering an issue with building the Ubuntu for Raspberry Pi images for oracular for the past couple of weeks:
https://launchpad.net/~ubuntu-cdimage/+livefs/ubuntu/oracular/ubuntu-preinstalled
It turns out the root issue is that the builders are running focal (kernel version 5.4), the build occurs under lxd (in an oracular container), and (peculiarly to this particular build, which uses ubuntu-image) the build requires snapd to be operational within the oracular container.
Steps to reproduce
As this occurred with our Raspberry Pi images under arm64, I'm replicating on a Pi here too, but I'd be interested to know if this also fails on PC archs. I'm using noble on the hardware itself, and demonstrating the issue within a focal VM:
The test (and indeed the image build, ultimately) work happily if performed under jammy's kernel (5.15). Also worth noting I am using the
security.nesting
workaround, and that I've tried lxc 4.0, 5.21, and 6.1 within the focal VM but all fail.Information to attach
dmesg
)lxc info NAME --show-log
) lxc-info-orac-test.loglxc config show NAME --expanded
) lxc-config-show-orac-test.loglxc monitor
while reproducing the issue)