dbus2 / zbus-old

Rust D-Bus crate.
https://gitlab.freedesktop.org/dbus/zbus
Other
49 stars 13 forks source link

Potential deadlock due to method calls in callbacks #223

Closed zeenix closed 1 year ago

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 9, 2021, 18:04

Because the connection's receive queue is of limited size, it is possible to deadlock the connection if one of the reading Streams waits on a dbus method call before continuing to poll. This could be caused by user code iterating a MessageStream or SignalStream, or by the internal iteration that drives callbacks.

See https://gitlab.freedesktop.org/dbus/zbus/-/merge_requests/402#note_1092692 for some existing discussion.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 9, 2021, 22:07

See also https://github.com/smol-rs/async-broadcast/pull/21

zeenix commented 2 years ago

Let me talk about stream and callbacks separately here:

I think for 2.0 it'd suffice if we do either or both of the following:

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 10, 2021, 18:45

I think most users would consider the following to be a continuous poll:

async fn watch_for_signals(proxy: FooProxy, state: Arc<State>) -> Result<()> {
   let mut stream = proxy.receive_sigfoo();
   while let Some(sig) = stream.next().await {
       let sig = sig.args()?;
       for bar in &sig.new_bar {
          let details = proxy.get_bar_details(bar).await?;
          state.add_details(details);
       }
   }
}

however, this exhibits the same deadlock.

I agree this issue could be resolved by making callbacks sync (with a note that using the blocking api is not allowed) and fully documenting the atual continuous nature of signal polling.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 10, 2021, 19:25

If we go this route, I also wonder if it'd be better to use synchronous channels and provide users receivers

This just brings back the unordered handler issue. A receiver can only run in a different thread from the sender, and at least std::sync::mpsc only lets each thread wait on a single receiver at a time. Other sync channels still have the same lost-ordering property as join on futures.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 10, 2021, 19:30

FYI: https://gitlab.freedesktop.org/danieldg/zbus/-/tree/no-async-callbacks

Note we still need to use async callbacks internally to drive signal streams. Users will need to set up unbounded or lossy channels of their own to handle the output of their signal handlers, if they want to produce a stream.

zeenix commented 2 years ago

I agree this issue could be resolved by making callbacks sync (with a note that using the blocking api is not allowed) and fully documenting the atual continuous nature of signal polling.

With callbacks yeah but how does it solve the deadlock scenario with streams that your code example demonstrates? :thinking: There is a simple workaround to the that problem though: they poll the stream at the same time as get_par_details using select. It's not even a work around really as user is not exactly polling the streams continuously but I do see how it's easy to fall into this trap.

zeenix commented 2 years ago

FYI: https://gitlab.freedesktop.org/danieldg/zbus/-/tree/no-async-callbacks

Cool.

Note we still need to use async callbacks internally to drive signal streams.

Sure.

Users will need to set up unbounded or lossy channels of their own to handle the output of their signal handlers, if they want to produce a stream

That sounds fine to me as well.

Moreover, we could remove the connect API from the blocking::Proxy since it's going to be exactly the same exact API anyway.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 11, 2021, 04:21

There is a simple workaround ...

Yes, but this is an easy gotcha if you don't take "continuously" as literally as needed.

The only other fix is to buffer more, since incoming events will always potentially block a reply.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 11, 2021, 04:22

blocking::Proxy still needs to expose it because of zbus_macros.

zeenix commented 2 years ago

Not really, macros can access the underlying zbus::Proxy through the inner method and also we should impl Deref<Target = zbus::Proxy> for blocking::Proxy.

zeenix commented 2 years ago

There is a simple workaround ...

Yes, but this is an easy gotcha if you don't take "continuously" as literally as needed.

Right, that's why I said "but I do see how it's easy to fall into this trap".

The only other fix is to buffer more, since incoming events will always potentially block a reply.

We could also document that. Smarter/dynamic handling of queue by async-broadcast (as you pointed out and filed issues for) will definitely help here I think.

zeenix commented 2 years ago

Oh and maybe we should have separate broadcast channels for signals and other messages (method calls and replies)? Then MessageStream could just be a join of the two streams internally. Method calls will create non-signal stream. Deadlocks will still be possible but at least with the code example like above, I think the chances of it occurring go extremely low.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 11, 2021, 15:23

No, unless you're willing to have an unbounded signal stream, you will still have deadlocks. If the signal stream's capacity is 64 messages and the dbus socket sends (65 signals, one MethodReply), the receive thread will block sending to the signal stream.

A potential solution to recommend in this case is to have two distinct sockets connecting to the bus: one for signals, and one for method calls. That way you can "stop" the signals while not stopping the method returns.

zeenix commented 2 years ago

In GitLab by @danieldg on Oct 11, 2021, 15:25

Note: I'm not recommending that zbus have two sockets in one Connection; I mean two independent Connections.

zeenix commented 2 years ago

No, unless you're willing to have an unbounded signal stream, you will still have deadlocks.

Ah yes, there is still going to be a signal stream that's not polling. Nvm then. Didn't have any caffeine yet. :)