Open mxgrey opened 3 years ago
This is great work thanks! I think I found a simpler solution. can you try it out? https://github.com/ReactiveX/RxCpp/compare/master...kirkshoop:multicastdeadlock
I considered unlocking the mutex before entering o.add(~)
. That would definitely resolve the risk of deadlock, but I wasn't sure if it could open up the risk of other race conditions.
For example, what if multicast_observer::on_completed()
is called on another thread as soon as the mutex is unlocked, and s.unsubscribe()
is triggered before o.add(~)
has finished inserting the new subscription? Can we be sure that all unsubscribing callbacks will get triggered correctly?
I honestly don't know the answer to those questions, and I don't trust myself to have a good enough read on the whole add/unsubscribe process to be certain about whether or not it's safe. I think if we can't guarantee correct behavior when unlocking the mutex, it would be much better to avoid the risk by using the recursive mutex. But if it is possible to guarantee correct behavior after unlocking the mutex, then I have no objection to that approach.
that is a good point and I will think about that more.
I did run all my threading regression tests with no errors (these test cases are marked with the tag [perf]).
This PR fixes #555 and provides a regression test that minimally recreates the deadlocking issue.
If you checkout
Rx/v2/test/CMakeLists.txt
andRx/v2/test/subscriptions/race_condition.cpp
into the currentmaster
branch and try to runrxcpp_test_race_condition
, you will most likely find that the test gets permanently deadlocked. The deadlocking result is not guaranteed, since it depends on a race condition that has a very narrow window, but it's very probable that the race condition will occur (at least on my machine, it has been). If it doesn't deadlock on the first try, then the test can be rerun repeatedly until it eventually does get deadlocked.But if you run the test on this branch, with the changes to
multicast_observer
, you will find that it never gets deadlocked no matter how many times the test is repeated. The deadlock is fixed by simply changing the mutex ofmulticast_observer::state_type
to astd::recursive_mutex
which guarantees that it is safe to recursively lock it (which is what causes the deadlock when using a plainstd::mutex
).I understand that some people consider
std::recursive_mutex
to be something that should always be avoided, because it may be indicative of a design flaw in the memory access strategy. But after putting a lot of thought into this specific deadlocking issue, I don't see an alternative fix for this without a dramatic redesign ofmulticast_observer
.While this deadlocking is a very rare occurrence, it may be an extremely impactful one. Whenever this occurs, it permanently locks up an entire thread. In my own use case, I have a design pattern where I assign an entire category of operations to be observed on a specific
rxcpp::schedulers::worker
(essentially using the worker as a bottleneck to ensure mutually exclusive access to a certain cluster of data), and if that worker ever gets caught in this deadlock, then I'll permanently lose the ability to perform that entire category of operations.So while it might be desirable to do a redesign to avoid the need for a
std::recursive_mutex
I would strongly encourage using it as a stopgap until an alternative solution can be found and implemented, assuming that might take some time to work out.