Closed HexKitchen closed 3 months ago
Thanks! Great catch. Will try to get this landed on Monday, in time for 16.4.4.
Just pushed some changes to try a slightly simpler approach, without thread-local storage. Also pushed some style fixes/tweaks. I'll have to get some sleep now, so I'll land this first thing tomorrow -- but in case you get a chance to test in the meantime please let me know how it goes :)
Super. Agreed this is simpler and solves the deadlock, I was able to confirm in testing as well. Thx!
This is a fix for #807 : recv wait can deadlock on an application thread.
See the issue description for a discussion of the root cause.
The proposed fix here operates by introducing a thread-local variable,
event_count_last_seen
. After entering the mutex, which ensures a stable reading ofcore->event_count
, the thread enters the wait forevent_cond
only ifcore->event_count
is equal toevent_count_last_seen
. In this way, we can guarantee that we won't begin waiting onevent_cond
at a time when the event we need has already been broadcast, which would produce the deadlock.