dbus2 / zbus-old

Rust D-Bus crate.
https://gitlab.freedesktop.org/dbus/zbus
Other
49 stars 13 forks source link

async_channel with tokio feature can deadlock #289

Closed zeenix closed 1 year ago

zeenix commented 1 year ago

In GitLab by @jgrund on Oct 4, 2022, 21:01

When activating the tokio feature of zbus, I am seeing frequent deadlocks (around 10% of the time). The source appears to be the mutex used to make a tokio mpsc channel mpmc here:

https://gitlab.freedesktop.org/dbus/zbus/-/blob/main/zbus/src/abstractions/async_channel.rs#L48

Screen_Shot_2022-10-04_at_12.16.46_PM

Removing the tokio feature from my dependent crate eliminates the deadlocks.

zeenix commented 1 year ago

@jgrund Thanks so much for reporting this and also providing all the details. If possible, kindly also provide either a small reproducer (preferred) or link to your code that causes the hang. That will be very useful to be able to fix this issue.

zeenix commented 1 year ago

In GitLab by @jgrund on Oct 4, 2022, 22:17

I haven't narrowed it down all the way yet, but it's related to the lazy caching in proxy.rs. If I set CacheProperties::No or CacheProperties::Yes when building a proxy object I don't hit the deadlock on startup.

zeenix commented 1 year ago

In GitLab by @jgrund on Oct 4, 2022, 23:11

I'm not sure if this is the exact issue I'm hitting, but there is at least deadlock potential between closing the sender and waiting for data on the receiver. Example:

#[cfg(test)]
mod tests {
    use crate::abstractions::async_channel::channel;

    #[tokio::test(flavor = "multi_thread", worker_threads = 2)]
    async fn test_chan_tokio_deadlock() {
        let (tx, rx) = channel::<()>(1);

        let t1 = tokio::spawn(async move {
            tx.close().await;
        });

        let t2 = tokio::spawn(async move {
            let x = rx.recv().await;
        });

        futures_util::future::join(t1, t2).await;
    }
}

If I run this with cargo test -F tokio test_chan_tokio_deadlock I can get it to hang after a few attempts. If I run with cargo test test_chan_tokio_deadlock it does not deadlock.

zeenix commented 1 year ago

Hi @jgrund Thanks for providing this testcase. Did you happen to make any further progress?

zeenix commented 1 year ago

Just FYI, I'm looking into this. We mainly need to drop Sender::close but that's not as simple as that.

zeenix commented 1 year ago

update: in the limited time I had this week, I managed to create a test case and solution but it need to clean it up before merging and releasing.

zeenix commented 1 year ago

@jgrund if you have a few mins to spare and test the fix, that would be great.

zeenix commented 1 year ago

mentioned in commit f8674f122a633fd998931d3b222070860974e19a