ComputationalRadiationPhysics / redGrapes

Resource-based, Declarative task-Graphs for Parallel, Event-driven Scheduling :grapes:
https://redgrapes.rtfd.io
Mozilla Public License 2.0
20 stars 5 forks source link

Fix freeze of worker #53

Closed michaelsippel closed 10 months ago

michaelsippel commented 10 months ago

A race condition on the cv condition-variable in Worker caused a freeze in the WorkerUtilization-unittest.

This freeze happens because of two sequential calls to wait() without proper synchronization of notify() in the case when the worker is assigned a new task before the worker starts its work-loop. There, first the Worker must progress to the start-barrier and enter the outer while-condition to wait for the start signal. Before wait() returns, both Worker::start() and the task-emplace sent their notify(), but the notify() corresponding to the emplacement of the new task is lost in ambiguity with the start-signal. Then the worker thread will wake up and jump to to work_loop() where it will wait again but the notify() which should wake the worker up was already sent and thus the worker will not wake up and the task will not be consumed.

In a 'real' scenario this might not be as apparent, because the worker will only be falsely inactive until the next task is emplaced which will wake the worker up and everything will continue normally.

This PR fixes this bug by eliminating the CondVar-based barrier before work_loop() which is also not required anymore because of recent refactorings.