Got rid of OptionallyOwned in favor of optional `std::unique_ptr<*, NullDeleter>.
Moved the shared state between listener thread and its scope object into a shared_ptr<>.
Phased our ReallyAtomicFlag.
... after which the issue remained, the call to .join() in the destructor of ListenerThread (ex. ListenerScope) hangs.
After a couple more hours I concluded that there's a 99.99+% chance that deadlock happens when Notify(), that notifies that listener thread should shut down, as the listener scope object is being destructed, coincides with Wait() called from within the thread.
Subtle hint:
The effects of notify_one()/notify_all() and wait()/wait_for()/wait_until() take place in a single total order, so it's impossible for notify_one() to, for example, be delayed and unblock a thread that started waiting just after the call to notify_one() was made.
The solution in this pull request is to not rely on a single Notify(), followed by .join(), but to keep calling Notify(), if the thread did not confirm its acceptance within a short period of time. This required exposing WaitFor() (Wait() with an explicit upper bound on the amount of time to keep waiting) in WaitableAtomic<T>.
And this works now. @mzhurovich, please take a look!
Fixed the deadlock.
Took me two and a half refactorings:
OptionallyOwned
in favor of optional `std::unique_ptr<*, NullDeleter>.shared_ptr<>
.ReallyAtomicFlag
.... after which the issue remained, the call to
.join()
in the destructor ofListenerThread (ex. ListenerScope)
hangs.After a couple more hours I concluded that there's a 99.99+% chance that deadlock happens when
Notify()
, that notifies that listener thread should shut down, as the listener scope object is being destructed, coincides withWait()
called from within the thread.Subtle hint:
(via http://en.cppreference.com/w/cpp/thread/condition_variable/notify_one)
The solution in this pull request is to not rely on a single
Notify()
, followed by.join()
, but to keep callingNotify()
, if the thread did not confirm its acceptance within a short period of time. This required exposingWaitFor()
(Wait()
with an explicit upper bound on the amount of time to keep waiting) inWaitableAtomic<T>
.And this works now. @mzhurovich, please take a look!