Potential deadlock due to method calls in callbacks #223

Closed zeenix closed 1 year ago

In GitLab by @danieldg on Oct 9, 2021, 18:04

Because the connection's receive queue is of limited size, it is possible to deadlock the connection if one of the reading Streams waits on a dbus method call before continuing to poll. This could be caused by user code iterating a MessageStream or SignalStream, or by the internal iteration that drives callbacks.

See https://gitlab.freedesktop.org/dbus/zbus/-/merge_requests/402#note_1092692 for some existing discussion.

In GitLab by @danieldg on Oct 9, 2021, 22:07

See also https://github.com/smol-rs/async-broadcast/pull/21

Let me talk about stream and callbacks separately here:

For streams, we already require them to be continuously pulled by the users. We also document this requirement and it's not an unusual requirement as Future and Stream are supposed to be continuously pulled. So this part I wouldn't worry about at all unless I missed something?
Callbacks OTOH are run by us so they're more in our territory even if they're provided by the user. However, keeping in mind that:
- signal callbacks should only be used in cases where order of signals is important.
- fixing this in zbus would likely either mean adding a bigger (IMO) security hole of unbound memory allocations or adding complicated code/logic.

I think for 2.0 it'd suffice if we do either or both of the following:

make signal callbacks on async proxy synchronous (as in fn rather than async fn). User can then task or threads with channels to do any processing they need to do. If we go this route, I also wonder if it'd be better to use synchronous channels and provide users receivers, instead of them providing callbacks. It'd also be more consistent with the signal stream API IMO.
Document the callbacks are not meant for making dbus calls or (especially) any long running tasks and doing so could result in deadlocks.

In GitLab by @danieldg on Oct 10, 2021, 18:45

I think most users would consider the following to be a continuous poll:

async fn watch_for_signals(proxy: FooProxy, state: Arc<State>) -> Result<()> {
   let mut stream = proxy.receive_sigfoo();
   while let Some(sig) = stream.next().await {
       let sig = sig.args()?;
       for bar in &sig.new_bar {
          let details = proxy.get_bar_details(bar).await?;
          state.add_details(details);
       }
   }
}

however, this exhibits the same deadlock.

I agree this issue could be resolved by making callbacks sync (with a note that using the blocking api is not allowed) and fully documenting the atual continuous nature of signal polling.

In GitLab by @danieldg on Oct 10, 2021, 19:25

If we go this route, I also wonder if it'd be better to use synchronous channels and provide users receivers

This just brings back the unordered handler issue. A receiver can only run in a different thread from the sender, and at least std::sync::mpsc only lets each thread wait on a single receiver at a time. Other sync channels still have the same lost-ordering property as join on futures.

In GitLab by @danieldg on Oct 10, 2021, 19:30

FYI: https://gitlab.freedesktop.org/danieldg/zbus/-/tree/no-async-callbacks

Note we still need to use async callbacks internally to drive signal streams. Users will need to set up unbounded or lossy channels of their own to handle the output of their signal handlers, if they want to produce a stream.

I agree this issue could be resolved by making callbacks sync (with a note that using the blocking api is not allowed) and fully documenting the atual continuous nature of signal polling.

With callbacks yeah but how does it solve the deadlock scenario with streams that your code example demonstrates? :thinking: There is a simple workaround to the that problem though: they poll the stream at the same time as get_par_details using select. It's not even a work around really as user is not exactly polling the streams continuously but I do see how it's easy to fall into this trap.

FYI: https://gitlab.freedesktop.org/danieldg/zbus/-/tree/no-async-callbacks

Cool.

Note we still need to use async callbacks internally to drive signal streams.

Sure.

Users will need to set up unbounded or lossy channels of their own to handle the output of their signal handlers, if they want to produce a stream

That sounds fine to me as well.

Moreover, we could remove the connect API from the blocking::Proxy since it's going to be exactly the same exact API anyway.

In GitLab by @danieldg on Oct 11, 2021, 04:21

There is a simple workaround ...

Yes, but this is an easy gotcha if you don't take "continuously" as literally as needed.

The only other fix is to buffer more, since incoming events will always potentially block a reply.

In GitLab by @danieldg on Oct 11, 2021, 04:22

blocking::Proxy still needs to expose it because of zbus_macros.

Not really, macros can access the underlying zbus::Proxy through the inner method and also we should impl Deref<Target = zbus::Proxy> for blocking::Proxy.

There is a simple workaround ...

Yes, but this is an easy gotcha if you don't take "continuously" as literally as needed.

Right, that's why I said "but I do see how it's easy to fall into this trap".

The only other fix is to buffer more, since incoming events will always potentially block a reply.

We could also document that. Smarter/dynamic handling of queue by async-broadcast (as you pointed out and filed issues for) will definitely help here I think.

Oh and maybe we should have separate broadcast channels for signals and other messages (method calls and replies)? Then MessageStream could just be a join of the two streams internally. Method calls will create non-signal stream. Deadlocks will still be possible but at least with the code example like above, I think the chances of it occurring go extremely low.

In GitLab by @danieldg on Oct 11, 2021, 15:23

No, unless you're willing to have an unbounded signal stream, you will still have deadlocks. If the signal stream's capacity is 64 messages and the dbus socket sends (65 signals, one MethodReply), the receive thread will block sending to the signal stream.

A potential solution to recommend in this case is to have two distinct sockets connecting to the bus: one for signals, and one for method calls. That way you can "stop" the signals while not stopping the method returns.

In GitLab by @danieldg on Oct 11, 2021, 15:25

Note: I'm not recommending that zbus have two sockets in one Connection; I mean two independent Connections.

No, unless you're willing to have an unbounded signal stream, you will still have deadlocks.

Ah yes, there is still going to be a signal stream that's not polling. Nvm then. Didn't have any caffeine yet. :)

dbus2 / zbus-old

Potential deadlock due to method calls in callbacks #223