Closed alugowski closed 1 year ago
Nice catch, thanks for posting this issue! I believe this happens if the second wait_for_tasks()
is launched exactly when the first one is between the lines if (!waiting)
and waiting = true;
, so it hasn't yet had a chance to set waiting
to true
.
The ideal solution is to add an additional mutex to ensure that waiting
stays locked until its value is set to true
. I will add that in the next release. Meanwhile, a simple solution is to replace in wait_for_tasks()
task_done_cv.wait(tasks_lock, [this] { return (tasks_total == (paused ? tasks.size() : 0)); });
with
task_done_cv.wait(tasks_lock, [this] { return (tasks_total == (paused ? tasks.size() : 0)) || !waiting; });
and similarly in the other two wait functions if desired.
Note: I added an extra variable to the example code to make the problem appear even faster.
Meanwhile, a simple solution
That does not seem to have an effect.
if the second
wait_for_tasks()
is launched exactly when the first one is between the linesif (!waiting)
andwaiting = true;
, so it hasn't yet had a chance to setwaiting
totrue
.
Wait, what? Both threads set waiting to true
, presumably as it should be.
This actually brings up another possibility: If thread 2 entered wait_for_tasks
after thread 1 already set waiting
to true, then thread 2 wouldn't wait at all! Sure that's not a deadlock, but it's incorrect behavior.
FYI, atomics have been specifically built to make test-and-set type operations, well, atomic. So something like if (!waiting) { waiting = true}
defeats the purpose of making waiting
atomic.
One of these would be more appropriate. I'd recommend the strong variants for this project. https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange
Though in this case if (!waiting)
introduces a behavior bug, even if done atomically.
You're 100% right, I appear to have made a mistake and wait_for_tasks()
is not behaving as expected. However, both the incorrect behavior and the deadlock can be resolved as follows:
if (!waiting)
along with its 2 braces, since as you pointed out, that doesn't make sense - if the second thread calls wait_for_tasks()
, it will not block as expected. (This essentially returns the function to the form it had in v3.3.0.)worker()
, simply change task_done_cv.notify_one();
to task_done_cv.notify_all();
so that it notifies all the waiting threads, not just one of them; this was something I actually overlooked in the original design, and it is the cause of the deadlock, as far as I can tell.I tested this with your minimal working example and did not encounter any deadlocks (I let it count to 30,000 and then stopped it manually). I also have a test in BS_thread_pool_test.cpp
to test for deadlocks in wait_for_tasks()
, and it too passes successfully with this change.
This fix will be posted within a few days with release v3.5.0 (along with a few other fixes I've been working on).
Thanks again for pointing this out, it was very helpful! :)
Great!
Delete the statement
if (!waiting)
along with its 2 braces,
I think that's correct. Notice that now the reproducing code always deadlocks on the first iteration. That's what I expected it to do when I wrote the code, I think the fact it took multiple iterations with stock 3.4.0 is that the second thread would simply not wait.
task_done_cv.notify_all();
Yes, this fixes it!
Update: This issue has been fixed in v3.5.0, which was just released. Thanks again!
Describe the bug
Deadlock if
wait_for_tasks()
is called from multiple threads at the same time.Minimal working example
Behavior
Expectation is that both
wait_for_tasks()
will block until the tasks are completed then both will unblock. In some situations this does not happen, deadlocking one of the threads.Example execution of above:
Edit: Added
thread_started
to example code to make the problem appear even faster.