Closed giuseppe closed 1 month ago
So is the problem here that podman moves itself out of the systemd unit cgroup thus the cgroup tracking of systemd fails and it will stop the unit as there are no processes in it?
So is the problem here that podman moves itself out of the systemd unit cgroup thus the cgroup tracking of systemd fails and it will stop the unit as there are no processes in it?
yes exactly. Podman moves itself out of the systemd scope and then systemd doesn't find any process running in the cgroup it has created
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: giuseppe, Luap99
The full list of commands accepted by this bot can be found here.
The pull request process is described here
/lgtm
@giuseppe I'm interested in the flow of the fix, since it's a bit hard to understand.
It looks like the UserOwnsCurrentSystemdCgroup
code scans every cgroup on the system?
And when podman encountered a cgroup v1 on a cgroup v2 system, it checked if it (podman or podman's current user) owns that cgroup v1, and then incorrectly returned false from UserOwnsCurrentSystemdCgroup
because it wasn't the owner of the cgroup1, which it shouldn't even have been checking at all? So podman basically incorrectly went "I am not the owner of my own cgroup, fail" whenever it encountered any cgroup1 mount point on a cgroup2 system?
And then when it (incorrectly) determined "I am not the owner of my own cgroup (UserOwnsCurrentSystemdCgroup returned false)", podman then "moves itself out of its own cgroup (the systemd cgroup2 delegate group owned by podman)", which then finally causes systemd to shut down podman since it's no longer managed by systemd's unified cgroup2?
Did I understand the flow of the bug correctly?
The fix was to totally ignore all cgroup v1 groups on a unified cgroup v2 system?
The thing I really don't understand is the fix's if-statement:
for scanner.Scan() {
line := scanner.Text()
parts := strings.SplitN(line, ":", 3)
if len(parts) < 3 {
continue
}
// If we are on a cgroup v2 system and there are cgroup v1 controllers
// mounted, ignore them when the current process is at the root cgroup.
if cgroup2 && parts[1] != "" && parts[2] == "/" {
continue
}
The cgroup2
check makes sense because that flag is set outside the loop and means "this is a unified cgroup2 system". Perfect.
But what's going on with the parts[1] != "" && parts[2] == "/"
? Those two are being set based on a split string from the current cgroup that the scanner is analyzing. How does that help it determine that "ignore them when the current process is at the root cgroup" check?
I guess it's something like this:
cgroup2
= true if system uses unified cgroup2 and we are in the root cgroupparts[1] != "" && parts[2] == "/"
= determines that the current group the scanner is analyzing is a cgroup1? Is this checking "if it's the root of a cgroup1 mount point" or something?Is that it?
By the way, do you still need me to check the things you asked about before this patch? Here:
https://github.com/containers/podman/issues/23990#issuecomment-2357709698
I'm also curious if you were able to try this fix with a cgroup manually created at /opt/piavpn/etc/cgroup/net_cls
to make sure that's working for that particular case? I guess you did but checking just in case.
I'd do a test with PIA VPN in a VM to help out if I knew how to build podman from scratch, but it seems like a complicated multi-repo build. :x
@Arcitec https://github.com/containers/podman/pull/24006
To test go to the packit rpm-build job for fedora 40 there, once they are down follow the dashboard link then follow the steps there on how to install the rpms if you want to test without building from source, otherwise just build my PR from source.
Direct link https://dashboard.packit.dev/jobs/copr/1881809 builds are done. It is enough if you only install the "podman" rpm you do not need all the other extra packages there, debuginfo,etc.. I think.
@Luap99 Brilliant, thank you very much for that.
I followed the instructions at https://dashboard.packit.dev/jobs/copr/1881809 and upgraded Fedora 40 VM from Podman 5.2.2 to 5.3.0-dev.
Then I re-enabled PIA VPN and its /opt/piavpn/etc/cgroup/net_cls
cgroup v1 and rebooted.
Verified that the cgroup v1 and v2 are co-existing again:
$ mount | grep -i cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)
none on /opt/piavpn/etc/cgroup/net_cls type cgroup (rw,relatime,seclabel,net_cls)
And then the moment we've all been waiting for, checking if podman is now able to start:
And here's a service restart with detailed system journal:
So I can confirm that podman is working perfectly after this patch! :heart:
It really makes yesterday's hell worth it, knowing that it led to an improvement for podman.
I'll be keeping an eye on Fedora Updates to see when Podman 5.3.0 comes out. But in the meantime, I actually neutered PIA VPN on my host OS by converting its cgroup mountpoint into a non-writable file to prevent it from mounting a cgroup v1 on my system, which means that I can keep using the "broken" version of podman for now.
sudo umount /opt/piavpn/etc/cgroup/net_cls
sudo rmdir /opt/piavpn/etc/cgroup/net_cls
sudo touch /opt/piavpn/etc/cgroup/net_cls
sudo chmod 444 /opt/piavpn/etc/cgroup/net_cls
What a long, strange trip it has been. ;-)
if the system is running on cgroup v2, ignore the ownership of cgroup v1 controllers when the current process is at the root cgroup.
Closes: https://github.com/containers/podman/issues/23990