flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 49 forks source link

`loginctl enable-linger flux` fails on TOSS 4 systems #4107

Open grondo opened 2 years ago

grondo commented 2 years ago

We discovered that loginctl enable-linger flux fails on our current release of RHEL8/TOSS4, and thus the flux service fails to start witht the included systemd unit file. We are now carrying a patch in the specfile that removes this line (since the flux systemd user instance isn't being used now), but we should figure out how to support "linger" on these systems (or get their version of systemd/loginctl fixed) before we start using sdexec in production.

chu11 commented 2 years ago

Assuming error is this (i ran by hand)

$ /usr/bin/loginctl enable-linger flux
Error registering authentication agent: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: Cannot determine user of subject (polkit-error-quark, 0)
Could not enable linger: No such process

that error message gets you some dubious answers online

but looks like something basic probably isn't setup correctly

[root@fluke108:~]# loginctl list-users
No users.

which i am obviously logged in. Not sure at the moment.

grondo commented 2 years ago

We did look into this a little and our fluxorama container (where loginctl set-linger works) is at systemd-239-51.el8_5.3.x86_64, whereas fluke is at systemd-239-45.el8_4.3.x86_64. It is presumed that updating to this version of systemd in TOSS would resolve the issue. In fact, there were a lot of changes related to logind in systemd-239-50:

* Fri Aug 27 2021 systemd maintenance team <systemd-maint@redhat.com> - 239-50
- Added option --check-inhibitors for non-tty usage (#1269726)
- logind: Introduce RebootWithFlags and others (#1269726)
- logind: add …WithFlags methods to policy (#1269726)
- logind: simplify flags handling a bit (#1269726)
- Update link to RHEL documentation (#1982584)
- Set default core ulimit to 0, but keep the hard limit ulimited (#1905582)
- shared/seccomp-util: address family filtering is broken on ppc (#1982650)
- logind: rework Seat/Session/User object allocation and freeing a bit (#1642460)
- logind: fix serialization/deserialization of user's "display session" (#1642460)
- logind: turn of stdio locking when writing session files too (#1642460)
- units: set StopWhenUnneeded= for the user slice units too (#1642460)
- units: improve Description= string a bit (#1642460)
- logind: improve logging in manager_connect_console() (#1642460)
- logind: save/restore User object's "stopping" field during restarts (#1642460)
- logind: correct bad clean-up path (#1642460)
- logind: fix bad error propagation (#1642460)
- logind: never elect a session that is stopping as display (#1642460)
- logind: introduce little helper that checks whether a session is ready (#1642460)
- logind: propagate session stop errors (#1642460)
- logind: rework how we manage the slice and user-runtime-dir@.service unit for each user (#1642460)
- logind: optionally, keep the user@.service instance for eached logged in user around for a while (#1642460)
- logind: add a RequiresMountsFor= dependency from the session scope unit to the home directory of the user (#1642460)
- logind: improve error propagation of user_check_linger_file() (#1642460)
- logind: automatically GC lingering users for who now user@.service (nor slice, not runtime dir service) is running anymore (#1642460)
- pam_systemd: simplify code which with we set environment variables (#1642460)
- logind: validate /run/user/1000 before we set it (#1642460)