cockpit-project / cockpit

Cockpit is a web-based graphical interface for servers.
http://www.cockpit-project.org/
GNU Lesser General Public License v2.1
11.26k stars 1.12k forks source link

/usr/libexec/cockpit-session has wrong owner in deployment #21201

Open spmfox opened 1 week ago

spmfox commented 1 week ago

Explain what happens

I am having a problem using cockpit-ws and centos-bootc.

Starting from centos-bootc, simply installing cockpit-ws and enabling cockpit.socket results in the UI giving this error: Internal error in login process

Originally I thought this was a bootc problem, and reported it over there - however they've fixed the selinux portion and this still does not work (with or without selinux enforcing) https://github.com/containers/bootc/issues/571

Fixing the issue is super easy: dnf reinstall cockpit-ws and I just cannot figure out what exactly is broken that a reinstall fixes. bootc gives you the option to make /usr writable via overlayfs but its lost after a reboot. So to fix the problem you simply do this:

bootc usr-overlay
dnf -y reinstall cockpit-ws

I've been trying to figure out what exactly is being changed during the reinstall. I've tried the journal (nothing interesting), strace on the cockpit-ws process (I didn't see anything but I would be willing to paste if someone thinks it will help), I've tried using auditctl to see what changes on the overlay but that didn't work quite right, I tried removing the self signed certs but that wasn't it either.

I've read through every single "internal error" issue reported on here, and most of them are resolved via reinstall or upgrade. In this case, the OS is immutable, so its broken on every boot.

Is there a way to get cockpit-ws to give verbose logs or can anyone think of other data I can look at that might give a clue?

To reproduce this, you will need a working bootc/silverblue/atomic VM that you can rebase. Containerfile:

FROM quay.io/centos-bootc/centos-bootc:stream9
RUN dnf -y install cockpit cockpit-ws cockpit-bridge
RUN systemctl enable cockpit.socket

Versions:

cockpit.x86_64               327-1.el9
cockpit-bridge.x86_64        327-1.el9
cockpit-packagekit.noarch    327-1.el9
cockpit-storaged.noarch      327-1.el9
cockpit-system.noarch        327-1.el9
cockpit-ws.x86_64            327-1.el9

/usr/libexec/cockpit-ws --version
Version: 327
Protocol: 1
Authorization: crypt1

Version of Cockpit

327

Where is the problem in Cockpit?

Unknown or not applicable

Server operating system

CentOS

Server operating system version

CentOS Stream release 9 - stream9.20241030.0

What browsers are you using?

Firefox

System log

Oct 31 20:52:26 localhost.localdomain systemd[1]: cockpit-motd.service: Deactivated successfully.
Oct 31 20:55:35 localhost.localdomain systemd[1]: Starting Dynamic user for cockpit-ws...
Oct 31 20:55:35 localhost.localdomain systemd[1]: Finished Dynamic user for cockpit-ws.
Oct 31 20:55:35 localhost.localdomain systemd[1]: Created slice Slice /system/cockpit-wsinstance-https-factory.
Oct 31 20:55:35 localhost.localdomain systemd[1]: Created slice Slice /system/cockpit-wsinstance-https.
Oct 31 20:55:35 localhost.localdomain systemd[1]: Created slice Resource limits for all cockpit-ws-https@.service instances.
Oct 31 20:55:35 localhost.localdomain systemd[1]: cockpit-wsinstance-https-factory@0-1142-988.service: Deactivated successfully.
Oct 31 20:57:05 localhost.localdomain systemd[1]: cockpit-wsinstance-https@e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.service: Deactivated successfully.
Oct 31 20:57:05 localhost.localdomain systemd[1]: cockpit-wsinstance-https@e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.socket: Deactivated successfully.
spmfox commented 1 week ago

Adding the strace while refreshing the UI. cockpit-ws_strace.txt

martinpitt commented 1 week ago

Hey @spmfox !

I'm afraid the strace doesn't help, there's nothing interesting in it. It doesn't fork cockpit-session or cockpit-ssh (or anything really), which are the parts responsible for setting up the login session. You'd need to attach strace earlier, and with -fvvs1024 to trace child processes too.

I tried to reproduce this. Our CI already has a fairly standard C9S bootc image. So I booted that and built a container like yours:

# cat <<EOF | podman build -t localhost/cws -
FROM quay.io/centos-bootc/centos-bootc:stream9
RUN dnf -y install cockpit-ws cockpit-bridge
RUN systemctl enable cockpit.socket
EOF

Originally I included "cockpit", but that killed my VM -- supposedly the extra dependencies of cockpit/storage/device-mapper/etc. are too much. So I left out "cockpit" from the above. That worked. Then:

bootc switch --transport=containers-storage localhost/cws:latest

... again kills my VM. No dmesg entry other than

SELinux: Context unconfined_u:object_r:invalid_bootcinstall_testlabel_t:s0 is not valid (left unmapped).

(which I suppose is not the cause for the freeze), no 100% CPU usage, but ssh and even my VT login session are completely unresponsive. I'm afraid I don't have a separate laptop to spare to try this on hardware.

I did another shot on this branch (it's an old experiment of mine) and building it on github, so that it's on ghcr.io now

Running this on my c9s VM finally worked:

bootc switch  ghcr.io/martinpitt/workstation-bootc:latest

and this also revealed the problem (this was my first suspicion):

# id cockpit-wsinstance
uid=981(cockpit-wsinstance) gid=981(cockpit-wsinstance) groups=981(cockpit-wsinstance)

# ls -l /usr/libexec/cockpit-session 
-rwsr-x---. 1 root 995 57120 Jan  1  1970 /usr/libexec/cockpit-session

so, installing cockpit-ws in bootc messed up the permission of that suid root helper. It needs to have group "cockpit-wsinstance". This is a bootc bug, and also explains why reinstalling the package helps -- that fixes the permissions. If you could report that to bootc, that'd be great!

Workarounds for you that come to my mind:

Note that we don't like this suid root thing either. We've had #16811 lingering for a long time already to move everything to DynamicUser= and avoid static groups completely. That'd be much more robust and nicer, but we aren't there yet :cry:

martinpitt commented 1 week ago

Forgot to say: I can reproduce this "internal error". This fixes it:

bootc usr-overlay
chgrp cockpit-wsinstance /usr/libexec/cockpit-session

Add a systemd tmpfiles.d which fixes the permission at boot

ah, that's bogus of course -- /usr isn't writable. You'd have to do the usroverlay on each boot, which is ugly. So best would be to actually fix that in bootc.

sigulete commented 1 week ago

Thanks. I had the same problem with fedora-bootc. This will help while waiting for the fix.

martinpitt commented 1 week ago

I reported this to https://github.com/containers/bootc/issues/870 .

sigulete commented 1 week ago

I dug deeper into this issue and it doesn’t seem to be a bootc bug. It is just the nature of how the workflow works.

Bootc relies on OCI containers to build and transport an image to be deployed in what we will call the server for the sake of this example. Each time the container image gets re-built, it will create a brand new deployment that will replace everything in /usr and everything that hasn't been locally changed in /etc.

Some applications including cockpit get an UID and GID assigned within a range. Sometimes, it is the same for various iterations until it is not, and the allocated UID/GID in the container for the new image will eventually differ from the previous one. As a consequence the file /etc/passwd in the container will be adjusted based on this UID/GID allocation.

If /etc/passwd was not changed in the server, then it will be replaced by bootc during the deployment of an upgrade, and all works fine. But if this file was locally modified, then the local version will prevail and it won't be replaced by the one from the container. If the administrator creates, modifies or removes users in the server, then /etc/passwd will be modified and it will never get updated.

I looked at my deployment and I found that it is not common to have files in /usr with non-root USERID or GROUPID. In my case, I found 10 files with non-root GROUPID and with the exception of cockpit all the rest are using system users, which are not defined in /etc/passwd. Consequently cockpit was the only application prone to this issue.

I resolved the problem creating my user within the container build, so /etc/passwd won't get modified locally after deployment and it will always be managed and updated by bootc. But this is something that won't be possible for a multiuser environment or similar use case.

You mentioned before that cockpit will eventually move to Dynamic Users, I suppose it will resolve this issue.

martinpitt commented 1 week ago

This was closed as unfixable in bootc, so we'll need to find a workaround.

martinpitt commented 1 week ago

I quickly chatted about that with @allisonkarlitskaya . As mentioned, the good and desirable solution to this is to finish PR #16808. That's a lot of work, but at least shouldn't have insurmountable problems.

A quick bandaid for that may be to ship cockpit-session as normal root:root 0755 in the packages/on disk, and prepare the suid root cockpit-wsinstance owned version at runtime. This idea isn't entirely new -- we already do that for /etc/cockpit/ws-certs.d/ to make them accessible to the unprivileged cockpit-wsinstance user.

So I made a couple of experiments how we could do that on Fedora 41, Debian stable, and RHEL 8.10. It all starts with

CS=/usr/libexec/cockpit-session   # or /usr/lib/cockpit/cockpit-session on Debian
chown root:root $CS
firewall-cmd --add-service cockpit

which without further steps just results in "Internal error in login process".

naïve: copy to /run

/run/ is mountednosuid,noexecby default, so that doesn't work. Same forPrivateTmp=`. So we need to do some legwork:

mkdir /run/cockpit/bin
mount -t tmpfs,exec,suid tmpfs /run/cockpit/bin
CS=/usr/libexec/cockpit-session   # or /usr/lib/cockpit/cockpit-session on Debian
cp $CS /run/cockpit/bin/
chcon --reference=$CS /run/cockpit/bin/cockpit-session
chgrp cockpit-wsinstance /run/cockpit/bin/cockpit-session
chmod u+s /run/cockpit/bin/cockpit-session
printf '[Basic]\nCommand=/run/cockpit/bin/cockpit-session\n[Negotiate]\nCommand=/run/cockpit/bin/cockpit-session' > /etc/cockpit/cockpit.conf
systemctl stop cockpit

This silently fails with SELinux on F41 and R8 despite https://fedoraproject.org/wiki/SELinux/Debugging#Enable_full_auditing , and works with setenforce 0. Unsurprisingly, it works fine on Debian.

ID-mapped mounts

See https://lwn.net/Articles/837566/ . It requires kernel 5.12, and Debian stable has 6.1, RHEL 9 has 5.14 :+1: RHEL 8 has 4.19, but we don't need to support that any more from main.

On Fedora 41:

mount -o bind,X-mount.idmap=g:0:`id -g cockpit-wsinstance`:1 $CS $CS
# mount: /usr/libexec/cockpit-session: mount failed: Unknown error 5013.

Too bad. This also fails the same way with creating /tmp/x and trying to mount over that.

The standard mount command doesn't support this yet in Debian stable. We can play around with https://github.com/brauner/mount-idmapped (we'd probably implement the actual solution in C similar to cockpit-certificate-ensure anyway). But this also doesn't work in Fedora 41:

curl -L -O https://raw.githubusercontent.com/brauner/mount-idmapped/refs/heads/master/mount-idmapped.c
gcc -o mount-idmapped mount-idmapped.c

./mount-idmapped --map-mount g:0:`id -g cockpit-wsinstance`:1 $CS /tmp/x
# Invalid argument - Failed to change mount attributes

(again, same for a different target mount)

mount_setattr(3, "", AT_EMPTY_PATH|AT_RECURSIVE, {attr_set=MOUNT_ATTR_IDMAP, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=4}, 32) = -1 EINVAL (Invalid argument)

So, this is all bit hackish :cry: