Closed alugowski closed 1 year ago
Thanks for the PR! This is pretty much what I had in mind when looking into solving this bug (based on your previous analysis in #93). In my version of this solution I'm using a const std::scoped_lock
, but that's just because I'm OCD about const
. I will incorporate this into v3.4.0, which should be released tomorrow (along with some other bug fixes and new features I've been working on).
There will also be two tests added to the test suite in this release, which check for deadlocks - one upon object destruction and another upon pool reset (both of which call destroy_threads()
). These checks will only be added to the non-light version, since they require a new feature to prevent the deadlock from locking the test itself, and that feature is not available in the light version.
Thank you again for your help with this bug, and sorry I didn't consider it high priority at first! I am now closing this PR and all the related issues since they have been resolved in v3.4.0.
Fix deadlock.
The root cause is missed notifications. Condition variable
wait
/notify
is instantaneous, so if a notification is sent while nobody is listening then it will be missed.The fix is to send the notification while the
tasks_mutex
is acquired. This, combined with the predicate version ofwait
already in use, ensures that there is no way to miss the notification.Ran against all three reported deadlock bugs and none demonstrate the issue anymore.
Fixes https://github.com/bshoshany/thread-pool/issues/107 Fixes https://github.com/bshoshany/thread-pool/issues/100 Fixes https://github.com/bshoshany/thread-pool/issues/93