Closed Liemarzac closed 1 year ago
Many thanks for this test and bug report!
I was able to replicate this, and find the source of the bug - TaskScheduler::TaskComplete
was accessing pDependentCurr->pTaskToRunOnCompletion
in a thread when another thread called OnDependenciesComplete
which marked the task as complete and testDependencies()
exiting which deleted the task.
This likely doesn't occur for many programs as it requires the task to be deleted immediately after WaitforTask, and the examples and tests normally reuse the same tasks when in a loop.
I have a fix on branch dev_dependencies_fix_78 which I'm testing, and I'll also add a similar test later.
Let me know if this resolves the issue for you.
Thanks for your quick turnaround on this issue. I am going to test it in the afternoon. It's likely that I hit this bug because the tasks are scheduled in a physics lib which runs many steps in quick succession. And you made a good point, the task graph should be built only once and stay around between each call. At the moment I just implemented a quick replacement of a parrallel_for loop,
No more issues after grabbing that fix. Thanks Doug.
Excellent, thanks! It's been working well here on my projects and tests too so I shall merge it to main soon.
Hi Doug, First of all, thanks for your enkiTS lib. I enjoyed playing with it in the past week. I have noticed a rare crash by using the dependencies and managed to get a minimal code sample which will trigger it in few minutes running it. Sometimes it will take a while, but it will eventually crash. The sample contains a StartTask which triggers two TaskA. After completion the TaskAs will trigger a EndTask. Here is a simple topology of the task graph:
StartTask --> TaskA x 2 in parallel --> EndTask
By running the sample in a very tight loop, it will eventually crash deep in the TaskManager. This is the line which will crash
ENKI_ASSERT( prevDeps < pDependent->pTaskToRunOnCompletion->m_DependenciesCount );
wherepTaskToRunOnCompletion
is an invalid pointer.It's entirely possible that I am not using the dependencies correctly, but I tried to stay as close as possible from the sample code provided. Because it seems to be timing specific, I suspect maybe a race condition in the LockLessMultireadPipe.
The sample I have provided should crash faster when compiled in Release.
enkiTSTest.zip