Closed dvdhrm closed 3 years ago
ust for clarity: The new semantics is now that we track an activation request from it is triggered until either the service fails or the name is successfully claimed, whichever happens first. You dropped a comment about this potentially not being well defined, but with the logic you implemented, it is, right?
Yeah exactly. We always knew about this race, but I didn't think it fully through. It apparently is possible to trigger, and this patch should fully close it.
(I force pushed a new revision which updates our man-pages as well.)
@dvdhrm @teg thanks for the quick fix! If you need some manual double-check and can point me to some scratch/rawhide package, I can find some time in the next days to give it a run.
@lucab v27 is now in rawhide. I will push to F32/33 tomorrow if there are no complaints.
I tried the v27 package on a plain F33 machine and I'm still seeing activation failures (client-side only; the service unit under systemd starts fine):
# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; systemctl status rpm-ostreed
* rpm-ostreed.service - rpm-ostree System Management Daemon
Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
Active: failed (Result: signal) since Mon 2021-02-15 11:10:58 UTC; 13ms ago
Docs: man:rpm-ostree(1)
Process: 691 ExecStart=/usr/bin/rpm-ostree start-daemon (code=killed, signal=KILL)
Main PID: 691 (code=killed, signal=KILL)
Status: "clients=0; idle exit in 63 seconds"
Feb 15 11:10:23 localhost systemd[1]: Starting rpm-ostree System Management Daemon...
Feb 15 11:10:23 localhost rpm-ostree[691]: Reading config file '/etc/rpm-ostreed.conf'
Feb 15 11:10:24 localhost rpm-ostree[691]: In idle state; will auto-exit in 64 seconds
Feb 15 11:10:24 localhost systemd[1]: Started rpm-ostree System Management Daemon.
Feb 15 11:10:57 local-test rpm-ostree[691]: client(id:cli dbus:1.27 unit:session-2.scope uid:0) added; new total=1
Feb 15 11:10:58 local-test rpm-ostree[691]: client(id:cli dbus:1.27 unit:session-2.scope uid:0) vanished; remaining=0
Feb 15 11:10:58 local-test rpm-ostree[691]: In idle state; will auto-exit in 63 seconds
Feb 15 11:10:58 local-test systemd[1]: rpm-ostreed.service: Sent signal SIGKILL to main process 691 (rpm-ostree) on client request.
Feb 15 11:10:58 local-test systemd[1]: rpm-ostreed.service: Main process exited, code=killed, status=9/KILL
Feb 15 11:10:58 local-test systemd[1]: rpm-ostreed.service: Failed with result 'signal'.
# rpm-ostree status
error: Could not activate remote peer.
# systemctl status rpm-ostreed
* rpm-ostreed.service - rpm-ostree System Management Daemon
Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
Active: activating (start) since Mon 2021-02-15 11:10:58 UTC; 20ms ago
Docs: man:rpm-ostree(1)
Main PID: 3271 (rpm-ostree)
Tasks: 1 (limit: 9444)
Memory: 1.3M
CGroup: /system.slice/rpm-ostreed.service
`-3271 /usr/bin/rpm-ostree start-daemon
Feb 15 11:10:58 local-test systemd[1]: Starting rpm-ostree System Management Daemon...
Test machine is:
* ostree://fedora:fedora/x86_64/coreos/stable
Version: 33.20210117.3.2 (2021-02-03T18:13:41Z)
BaseCommit: 20de1953c18bd432a8ed4e19b91c64978100dba7d1c4813f91f8cf4d4d2411b4
GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31
ReplacedBasePackages: dbus-broker 24-1.fc33 -> 27-1.fc35
If I drop the v27 override and rollback to v24, that activation failure goes away.
@lucab I cannot reproduce this, anymore. Mhhhh. So you are saying this isn't even a race-condition for you?
@dvdhrm from the quick v27 test above, it doesn't seem so; an artificial sleep 15
does not change the behavior.
However, interestingly, a systemctl reset-failed
does have some positive effect:
# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; rpm-ostree status > /dev/null && echo ok
error: Could not activate remote peer.
# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; sleep 15; rpm-ostree status > /dev/null && echo ok
error: Could not activate remote peer.
# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; systemctl reset-failed; rpm-ostree status > /dev/null && echo ok
ok
I moved all the clues collected so far to its own ticket at https://github.com/bus1/dbus-broker/issues/258.
Extend the activation communication between broker and launcher to use serial numbers. This allows us to discard stale activation failures and closes a race condition where we would incorrectly fail new activations immediately after a previous service exited.
@lucab @cgwalters