dougbinks / enkiTS

A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.
zlib License
1.75k stars 146 forks source link

Correctness of event implementation in POSIX systems #12

Closed OlegOAndreev closed 7 years ago

OlegOAndreev commented 7 years ago

Hello, thank you for your library!

Minor problem: as far as I can understand, the current implementation of EventSignal/EventWait on POSIX systems allows for a race condition:

  1. [worker thread] TaskScheduler::WaitForTasks is called when there are no tasks.
  2. [worker thread] No tasks are found, worker thread increments m_NumThreadsWaiting.
  3. [another thread] TaskScheduler::SplitAndAddTask from another thread adds the task, and calls the TaskScheduler::WakeThreads.
  4. [another thread] TaskScheduler::WakeThreads checks the m_NumThreadsWaiting and calls the EventSignal, effectively calling pthread_cond_broadcast (btw, it should rather be called with mutex unlocked). No threads are waiting on the cond, therefore no threads are waked.
  5. [worker thread] EventWait is called, sleeping the thread until EventSignal is called next time, even when the thread could be processing the added task.

There are two ways to fix this behavior: either add a "signaled" variable to the event, which will be checked and modified with mutex locked or anonymous semaphores should be used instead (http://man7.org/linux/man-pages/man3/sem_init.3.html with pshared = 0).

The latter is generally a preferred way at least on Linux and OS X, as it is both simpler, faster and matches the Windows impl.

I will prepare a pull-request with a fix if you'd like.

dougbinks commented 7 years ago

Many thanks for the issue, and for the offer of preparing a pull-request.

Yes, there is a potential for a thread to wait when there is work to be done. This can cause a slight loss in parallel processing but the task should still run correctly by another thread (so long as the main thread ensures it's complete with a WaitforTaskSet, or another task is added). A fix for this would be to move the m_NumThreadsWaiting variable operations within the event mutex.

The non C++11 event approach on both Windows and Posix systems is undergoing a bit of an overhaul at the moment. I've moved the Windows approach to semaphores and will probably do the same for the Posix approach as the use of a mutex makes is a problem if a non locking scheduler is desired. I hope to do this relatively soon (in the next week or two), and at the same time fix this issue (note that it's currently not yet fixed on Windows) whilst also removing the lock. The finished event will be a little like the semaphore events from Preshing's blog.

On a historical note, the reason for this error was an optimization added on top of an event API when I should have changed the underlying event API, but didn't as I was using it elsewhere in other code.

If you're happy with the plan I'll leave this open for now and close once I get the new event system in place. If you feel you need a quicker fix I can change the event API slightly to resolve this as a temporary solution.

OlegOAndreev commented 7 years ago

No problem, as you noted this is definitely not a major issue.

dougbinks commented 7 years ago

The new implementation should resolve this.