bus1 / dbus-broker

Linux D-Bus Message Broker
https://github.com/bus1/dbus-broker/wiki
Apache License 2.0
675 stars 78 forks source link

broker: pass serial numbers with activations #257

Closed dvdhrm closed 3 years ago

dvdhrm commented 3 years ago

Extend the activation communication between broker and launcher to use serial numbers. This allows us to discard stale activation failures and closes a race condition where we would incorrectly fail new activations immediately after a previous service exited.

@lucab @cgwalters

cgwalters commented 3 years ago

xref https://github.com/bus1/dbus-broker/pull/250

dvdhrm commented 3 years ago

ust for clarity: The new semantics is now that we track an activation request from it is triggered until either the service fails or the name is successfully claimed, whichever happens first. You dropped a comment about this potentially not being well defined, but with the logic you implemented, it is, right?

Yeah exactly. We always knew about this race, but I didn't think it fully through. It apparently is possible to trigger, and this patch should fully close it.

dvdhrm commented 3 years ago

(I force pushed a new revision which updates our man-pages as well.)

lucab commented 3 years ago

@dvdhrm @teg thanks for the quick fix! If you need some manual double-check and can point me to some scratch/rawhide package, I can find some time in the next days to give it a run.

dvdhrm commented 3 years ago

@lucab v27 is now in rawhide. I will push to F32/33 tomorrow if there are no complaints.

lucab commented 3 years ago

I tried the v27 package on a plain F33 machine and I'm still seeing activation failures (client-side only; the service unit under systemd starts fine):

# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; systemctl status rpm-ostreed

* rpm-ostreed.service - rpm-ostree System Management Daemon
     Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
     Active: failed (Result: signal) since Mon 2021-02-15 11:10:58 UTC; 13ms ago
       Docs: man:rpm-ostree(1)
    Process: 691 ExecStart=/usr/bin/rpm-ostree start-daemon (code=killed, signal=KILL)
   Main PID: 691 (code=killed, signal=KILL)
     Status: "clients=0; idle exit in 63 seconds"

Feb 15 11:10:23 localhost systemd[1]: Starting rpm-ostree System Management Daemon...
Feb 15 11:10:23 localhost rpm-ostree[691]: Reading config file '/etc/rpm-ostreed.conf'
Feb 15 11:10:24 localhost rpm-ostree[691]: In idle state; will auto-exit in 64 seconds
Feb 15 11:10:24 localhost systemd[1]: Started rpm-ostree System Management Daemon.
Feb 15 11:10:57 local-test rpm-ostree[691]: client(id:cli dbus:1.27 unit:session-2.scope uid:0) added; new total=1
Feb 15 11:10:58 local-test rpm-ostree[691]: client(id:cli dbus:1.27 unit:session-2.scope uid:0) vanished; remaining=0
Feb 15 11:10:58 local-test rpm-ostree[691]: In idle state; will auto-exit in 63 seconds
Feb 15 11:10:58 local-test systemd[1]: rpm-ostreed.service: Sent signal SIGKILL to main process 691 (rpm-ostree) on client request.
Feb 15 11:10:58 local-test systemd[1]: rpm-ostreed.service: Main process exited, code=killed, status=9/KILL
Feb 15 11:10:58 local-test systemd[1]: rpm-ostreed.service: Failed with result 'signal'.

# rpm-ostree status
error: Could not activate remote peer.

# systemctl status rpm-ostreed
* rpm-ostreed.service - rpm-ostree System Management Daemon
     Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
     Active: activating (start) since Mon 2021-02-15 11:10:58 UTC; 20ms ago
       Docs: man:rpm-ostree(1)
   Main PID: 3271 (rpm-ostree)
      Tasks: 1 (limit: 9444)
     Memory: 1.3M
     CGroup: /system.slice/rpm-ostreed.service
             `-3271 /usr/bin/rpm-ostree start-daemon

Feb 15 11:10:58 local-test systemd[1]: Starting rpm-ostree System Management Daemon...

Test machine is:

* ostree://fedora:fedora/x86_64/coreos/stable
                   Version: 33.20210117.3.2 (2021-02-03T18:13:41Z)
                BaseCommit: 20de1953c18bd432a8ed4e19b91c64978100dba7d1c4813f91f8cf4d4d2411b4
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31
      ReplacedBasePackages: dbus-broker 24-1.fc33 -> 27-1.fc35

If I drop the v27 override and rollback to v24, that activation failure goes away.

dvdhrm commented 3 years ago

@lucab I cannot reproduce this, anymore. Mhhhh. So you are saying this isn't even a race-condition for you?

lucab commented 3 years ago

@dvdhrm from the quick v27 test above, it doesn't seem so; an artificial sleep 15 does not change the behavior. However, interestingly, a systemctl reset-failed does have some positive effect:

# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; rpm-ostree status > /dev/null && echo ok
error: Could not activate remote peer.

# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; sleep 15; rpm-ostree status > /dev/null && echo ok
error: Could not activate remote peer.

# rpm-ostree status > /dev/null; systemctl kill -s KILL rpm-ostreed; systemctl reset-failed; rpm-ostree status > /dev/null && echo ok
ok
lucab commented 3 years ago

I moved all the clues collected so far to its own ticket at https://github.com/bus1/dbus-broker/issues/258.