Open spmfox opened 1 week ago
Adding the strace while refreshing the UI. cockpit-ws_strace.txt
Hey @spmfox !
I'm afraid the strace doesn't help, there's nothing interesting in it. It doesn't fork cockpit-session or cockpit-ssh (or anything really), which are the parts responsible for setting up the login session. You'd need to attach strace earlier, and with -fvvs1024
to trace child processes too.
I tried to reproduce this. Our CI already has a fairly standard C9S bootc image. So I booted that and built a container like yours:
# cat <<EOF | podman build -t localhost/cws -
FROM quay.io/centos-bootc/centos-bootc:stream9
RUN dnf -y install cockpit-ws cockpit-bridge
RUN systemctl enable cockpit.socket
EOF
Originally I included "cockpit", but that killed my VM -- supposedly the extra dependencies of cockpit/storage/device-mapper/etc. are too much. So I left out "cockpit" from the above. That worked. Then:
bootc switch --transport=containers-storage localhost/cws:latest
... again kills my VM. No dmesg entry other than
SELinux: Context unconfined_u:object_r:invalid_bootcinstall_testlabel_t:s0 is not valid (left unmapped).
(which I suppose is not the cause for the freeze), no 100% CPU usage, but ssh and even my VT login session are completely unresponsive. I'm afraid I don't have a separate laptop to spare to try this on hardware.
I did another shot on this branch (it's an old experiment of mine) and building it on github, so that it's on ghcr.io now
Running this on my c9s VM finally worked:
bootc switch ghcr.io/martinpitt/workstation-bootc:latest
and this also revealed the problem (this was my first suspicion):
# id cockpit-wsinstance
uid=981(cockpit-wsinstance) gid=981(cockpit-wsinstance) groups=981(cockpit-wsinstance)
# ls -l /usr/libexec/cockpit-session
-rwsr-x---. 1 root 995 57120 Jan 1 1970 /usr/libexec/cockpit-session
so, installing cockpit-ws in bootc messed up the permission of that suid root helper. It needs to have group "cockpit-wsinstance". This is a bootc bug, and also explains why reinstalling the package helps -- that fixes the permissions. If you could report that to bootc, that'd be great!
Workarounds for you that come to my mind:
chgrp
it in an ExecStartPre=
Note that we don't like this suid root thing either. We've had #16811 lingering for a long time already to move everything to DynamicUser=
and avoid static groups completely. That'd be much more robust and nicer, but we aren't there yet :cry:
Forgot to say: I can reproduce this "internal error". This fixes it:
bootc usr-overlay
chgrp cockpit-wsinstance /usr/libexec/cockpit-session
Add a systemd tmpfiles.d which fixes the permission at boot
ah, that's bogus of course -- /usr isn't writable. You'd have to do the usroverlay on each boot, which is ugly. So best would be to actually fix that in bootc.
Thanks. I had the same problem with fedora-bootc
.
This will help while waiting for the fix.
I reported this to https://github.com/containers/bootc/issues/870 .
I dug deeper into this issue and it doesn’t seem to be a bootc
bug. It is just the nature of how the workflow works.
Bootc relies on OCI containers to build and transport an image to be deployed in what we will call the server for the sake of this example. Each time the container image gets re-built, it will create a brand new deployment that will replace everything in /usr
and everything that hasn't been locally changed in /etc
.
Some applications including cockpit
get an UID and GID assigned within a range. Sometimes, it is the same for various iterations until it is not, and the allocated UID/GID in the container for the new image will eventually differ from the previous one. As a consequence the file /etc/passwd
in the container will be adjusted based on this UID/GID allocation.
If /etc/passwd
was not changed in the server, then it will be replaced by bootc
during the deployment of an upgrade, and all works fine. But if this file was locally modified, then the local version will prevail and it won't be replaced by the one from the container. If the administrator creates, modifies or removes users in the server, then /etc/passwd
will be modified and it will never get updated.
I looked at my deployment and I found that it is not common to have files in /usr
with non-root USERID or GROUPID. In my case, I found 10 files with non-root GROUPID and with the exception of cockpit
all the rest are using system users, which are not defined in /etc/passwd
. Consequently cockpit
was the only application prone to this issue.
I resolved the problem creating my user within the container build, so /etc/passwd
won't get modified locally after deployment and it will always be managed and updated by bootc
. But this is something that won't be possible for a multiuser environment or similar use case.
You mentioned before that cockpit
will eventually move to Dynamic Users, I suppose it will resolve this issue.
This was closed as unfixable in bootc, so we'll need to find a workaround.
I quickly chatted about that with @allisonkarlitskaya . As mentioned, the good and desirable solution to this is to finish PR #16808. That's a lot of work, but at least shouldn't have insurmountable problems.
A quick bandaid for that may be to ship cockpit-session
as normal root:root 0755 in the packages/on disk, and prepare the suid root cockpit-wsinstance owned version at runtime. This idea isn't entirely new -- we already do that for /etc/cockpit/ws-certs.d/ to make them accessible to the unprivileged cockpit-wsinstance
user.
So I made a couple of experiments how we could do that on Fedora 41, Debian stable, and RHEL 8.10. It all starts with
CS=/usr/libexec/cockpit-session # or /usr/lib/cockpit/cockpit-session on Debian
chown root:root $CS
firewall-cmd --add-service cockpit
which without further steps just results in "Internal error in login process".
/run/ is mounted
nosuid,noexecby default, so that doesn't work. Same for
PrivateTmp=`. So we need to do some legwork:
mkdir /run/cockpit/bin
mount -t tmpfs,exec,suid tmpfs /run/cockpit/bin
CS=/usr/libexec/cockpit-session # or /usr/lib/cockpit/cockpit-session on Debian
cp $CS /run/cockpit/bin/
chcon --reference=$CS /run/cockpit/bin/cockpit-session
chgrp cockpit-wsinstance /run/cockpit/bin/cockpit-session
chmod u+s /run/cockpit/bin/cockpit-session
printf '[Basic]\nCommand=/run/cockpit/bin/cockpit-session\n[Negotiate]\nCommand=/run/cockpit/bin/cockpit-session' > /etc/cockpit/cockpit.conf
systemctl stop cockpit
This silently fails with SELinux on F41 and R8 despite https://fedoraproject.org/wiki/SELinux/Debugging#Enable_full_auditing , and works with setenforce 0
. Unsurprisingly, it works fine on Debian.
See https://lwn.net/Articles/837566/ . It requires kernel 5.12, and Debian stable has 6.1, RHEL 9 has 5.14 :+1: RHEL 8 has 4.19, but we don't need to support that any more from main.
On Fedora 41:
mount -o bind,X-mount.idmap=g:0:`id -g cockpit-wsinstance`:1 $CS $CS
# mount: /usr/libexec/cockpit-session: mount failed: Unknown error 5013.
Too bad. This also fails the same way with creating /tmp/x and trying to mount over that.
The standard mount
command doesn't support this yet in Debian stable. We can play around with https://github.com/brauner/mount-idmapped (we'd probably implement the actual solution in C similar to cockpit-certificate-ensure
anyway). But this also doesn't work in Fedora 41:
curl -L -O https://raw.githubusercontent.com/brauner/mount-idmapped/refs/heads/master/mount-idmapped.c
gcc -o mount-idmapped mount-idmapped.c
./mount-idmapped --map-mount g:0:`id -g cockpit-wsinstance`:1 $CS /tmp/x
# Invalid argument - Failed to change mount attributes
(again, same for a different target mount)
mount_setattr(3, "", AT_EMPTY_PATH|AT_RECURSIVE, {attr_set=MOUNT_ATTR_IDMAP, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=4}, 32) = -1 EINVAL (Invalid argument)
So, this is all bit hackish :cry:
Explain what happens
I am having a problem using cockpit-ws and centos-bootc.
Starting from centos-bootc, simply installing cockpit-ws and enabling cockpit.socket results in the UI giving this error:
Internal error in login process
Originally I thought this was a bootc problem, and reported it over there - however they've fixed the selinux portion and this still does not work (with or without selinux enforcing) https://github.com/containers/bootc/issues/571
Fixing the issue is super easy:
dnf reinstall cockpit-ws
and I just cannot figure out what exactly is broken that a reinstall fixes. bootc gives you the option to make /usr writable via overlayfs but its lost after a reboot. So to fix the problem you simply do this:I've been trying to figure out what exactly is being changed during the reinstall. I've tried the journal (nothing interesting), strace on the cockpit-ws process (I didn't see anything but I would be willing to paste if someone thinks it will help), I've tried using auditctl to see what changes on the overlay but that didn't work quite right, I tried removing the self signed certs but that wasn't it either.
I've read through every single "internal error" issue reported on here, and most of them are resolved via reinstall or upgrade. In this case, the OS is immutable, so its broken on every boot.
Is there a way to get cockpit-ws to give verbose logs or can anyone think of other data I can look at that might give a clue?
To reproduce this, you will need a working bootc/silverblue/atomic VM that you can rebase. Containerfile:
Versions:
Version of Cockpit
327
Where is the problem in Cockpit?
Unknown or not applicable
Server operating system
CentOS
Server operating system version
CentOS Stream release 9 - stream9.20241030.0
What browsers are you using?
Firefox
System log