bus1 / dbus-broker

Linux D-Bus Message Broker
https://github.com/bus1/dbus-broker/wiki
Apache License 2.0
667 stars 78 forks source link

On activation, skipped jobs are treated as successful #276

Closed bluca closed 2 years ago

bluca commented 2 years ago

https://github.com/bus1/dbus-broker/blob/727bee57cd3e3b5806eb8599abbdd49984b75732/src/launch/service.c#L251

Shouldn't a skipped job be treated as a failed one and reset the activation? Currently we return skipped when the unit cannot be started. I'm thinking whether we should also report skipped when a condition fails. This came from a bug report from a user https://github.com/systemd/systemd/issues/21520

Also I just found that when you have a unit that is not running, and has a condition which is not verified, the dbus call will be stuck, and trying to call the dbus method again will do nothing, even if the condition in the meanwhile became true, until dbus-broker is restarted.

EG:

[Unit]
ConditionPathExists=/tmp/marker
[Service]
BusName=com.some.name
ExecStart=/usr/bin/some_service

Switching systemd to return JOB_SKIPPED when a condition fails, and treating skipped as a failure in the launcher snippet above, fixes the issue. But I'm not 100% sure it is the right thing to do? Was there a particular reason for treating skipped as successful activation?

dvdhrm commented 2 years ago

Switching systemd to return JOB_SKIPPED when a condition fails, and treating skipped as a failure in the launcher snippet above, fixes the issue. But I'm not 100% sure it is the right thing to do? Was there a particular reason for treating skipped as successful activation?

Yes, we wanted to mirror the behavior of systemd (see systemd `src/shared/bus-wait-for-jobs.c´). We might have missed some aspects of that helper, though.

If I remember correctly, though, the justification for conditions back when we introduced them to systemd was to allow skipping units without failing them. I am not entirely sure what the best call for dbus-broker is, though. Given that we require bus-names, there is little reason to consider skipped units as success. However, this then begs the question whether that is the only situation systemd returns "skipped"?

Also I just found that when you have a unit that is not running, and has a condition which is not verified, the dbus call will be stuck, and trying to call the dbus method again will do nothing, even if the condition in the meanwhile became true, until dbus-broker is restarted.

What does "condition is not verified" mean? You mean it is false?

Thanks for the hint! We should definitely get this sorted!

bluca commented 2 years ago

Switching systemd to return JOB_SKIPPED when a condition fails, and treating skipped as a failure in the launcher snippet above, fixes the issue. But I'm not 100% sure it is the right thing to do? Was there a particular reason for treating skipped as successful activation?

Yes, we wanted to mirror the behavior of systemd (see systemd `src/shared/bus-wait-for-jobs.c´). We might have missed some aspects of that helper, though.

If I remember correctly, though, the justification for conditions back when we introduced them to systemd was to allow skipping units without failing them. I am not entirely sure what the best call for dbus-broker is, though. Given that we require bus-names, there is little reason to consider skipped units as success. However, this then begs the question whether that is the only situation systemd returns "skipped"?

So right now JOB_SKIPPED (on a start job) is returned only when it was requested to start a unit that doesn't have the concept of being started, which as far as I can see is only the device unit as of now, or when something like a condrestart is requested:

https://github.com/systemd/systemd/blob/ff7e7c2b3a2b4367e0a03a51b7d714dfe0837abb/src/core/job.c#L894 https://github.com/systemd/systemd/blob/ff7e7c2b3a2b4367e0a03a51b7d714dfe0837abb/src/core/unit.c#L1842

So yeah, I don't think it makes much sense to consider it as successful here, as in practice right now it can't happen at all, and if we end up returning skipped in more cases, still the service name will not appear on the bus, so it's not successful from the point of view of dbus activation. It's different in systemctl, as it makes sense to consider it all good for reporting purposes when a condition causes a unit to skip activation, as that's how it's intended to work (there's the Assert settings to make them hard failures), and it has been working like that since it was introduced by https://github.com/systemd/systemd/commit/d68201e9aa5e9ebd6085b1bb8892c42e9d20be75

Also I just found that when you have a unit that is not running, and has a condition which is not verified, the dbus call will be stuck, and trying to call the dbus method again will do nothing, even if the condition in the meanwhile became true, until dbus-broker is restarted.

What does "condition is not verified" mean? You mean it is false?

Thanks for the hint! We should definitely get this sorted!

Yeah, like ConditionPathExists=/tmp/foo will fail if /tmp/foo doesn't exist, and the job will be skipped (but reports success right now).

I've opened a PR to change this here, and also to return a human-readable error string up the stack when an activation request fails.

If we end up changing done to skipped in systemd, then things will "just work" here. It's tricky because technically is a backward incompatible change, so it could break things. Skipped activations due to conditions have always been reported as successful jobs, even when the internal reporting was changed to treat them differently inside systemd: https://github.com/systemd/systemd/commit/6e64994d690ccc0359a4a9d20fbead2dbfe23836

bluca commented 2 years ago

Once https://github.com/systemd/systemd/pull/22877 is sorted I'll reopen so that we can use the synchronous answer, but for now the immediate hang should be solved in most cases