AdamsLair / duality

a 2D Game Development Framework
https://adamslair.github.io/duality
MIT License
1.41k stars 289 forks source link

Investigate Editor / WinForms Re-Entrancy Problems with Task Parallel Library #295

Open ilexp opened 8 years ago

ilexp commented 8 years ago

Summary

When using methods of the task parallel library, such as Parallel.For, the editor will pump a selective subset of windows messages while waiting for its completion. Among these events is WM_PAINT, which can cause re-entrancy problems when attempting to repaint the scene.

How to reproduce

Unclear. It happens occasionally - the more Parallel.For is used in the core, the higher the chance. Right now, there is no Parallel.For in the main update / render cycle because of the re-entrancy problems, but adding some for testing should reproduce this behavior easily.

Workaround

Avoid Parallel.For in the main update / render cycle. This is what's done currently.

Attachments

ilexp commented 8 years ago

More research. There's a very interesting comment in the DispatcherSynchronizationContext source file (mirrored in this gist) regarding their Wait implementation:

if(_dispatcher._disableProcessingCount > 0)
{
    // Call into native code directly in order to avoid the default 
    // CLR locking behavior which pumps messages under contention.
    // Even though they try to pump only the COM messages, any 
    // messages that have been SENT to the window are also 
    // dispatched.  This can lead to unpredictable reentrancy.
    return MS.Win32.UnsafeNativeMethods.WaitForMultipleObjectsEx(waitHandles.Length, waitHandles, waitAll, millisecondsTimeout, false); 
}
else
{
    return SynchronizationContext.WaitHelper(waitHandles, waitAll, millisecondsTimeout); 
}

As _disableProcessingCount hints, this is only activated explicitly in a handful of select spots, so it remains arguable whether it would be a great idea to use this method as a default. Further investigation is required.

ilexp commented 8 years ago

Source Code Links:

Looks like this is where the actual message pump is implemented.

ilexp commented 8 years ago

Very thorough StackOverflow answer on a similar question. It also details custom implementations of synchronization contexts.

ilexp commented 8 years ago

Quoting from the very first StackOverflow link in the initial post:

This comes from the CLR, it implements the contract that an STA thread (aka UI thread) is never allowed to block on a synchronization object. Like Parallel.For() does. It pumps to ensure that no deadlock can occur.

With this in mind, it might be the best course of action to respect the original Wait implementation and look into countering the re-entrancy problems directly instead.

Any insight by more experienced people appreciated.

ilexp commented 8 years ago

As mirrored in this gist, there is a sample implementation of a non-re-entrant synchronization context in the first StackOverflow link. Performed a quick test with the editor.

Test Setup

Results

Regular WinForms SynchronizationContext

Custom SynchronizationContext

So the custom sync context actually seems to fix the re-entrancy issues. It still remains to be determine whether using that custom synchronization context is a good idea. Again, any insights appreciated.

It also remains unclear why exactly re-entrancy is an issue here and whether it can be countered elsewhere. This has still to be investigated.

ilexp commented 7 years ago

More research:

Overall, it seems preferable to do without a custom solution here, as there are too many WinForms, TPL, Thread internals a change might affect.

Barsonax commented 6 years ago

A alternative could be running duality in a separate thread (possibly with its own custom duality sync context to support tasks). This would require that the UI does not update duality while its doing a gameupdate though.

Pro's:

The big con:

Barsonax commented 6 years ago

Another solution to this problem could be that instead of calling any duality render methods directly a more data focused approach would be used. The editor will only read out the 'renderstate'. This state gets updated only when duality render methods are run. That way it does not matter what happens at the winform side, duality would be in full control when any duality render methods get called.

Barsonax commented 6 years ago

After thinking about it some more the real problem here is that winforms can directly call a duality render method. I dont think this should happen especially since the repaint event can happen at any moment (well unless the mainthread is really blocked) while most things in duality have a clear defined order of execution.

Any solution that can break this direct control might solve this issue. I dont think you really need a separate thread in this solution. So very globally:

  1. Find a way to ensure only duality can call the ICmpRenderer.Draw method. I believe this means that Camera.Render should only be ever called from duality itself. Currently it seems to be called because its wired up to the control.Paint event here. There might be more entry points though.
  2. Ensure this doesnt break anything. This possibly means simply reusing the last calculated result instead of calling Draw methods.

EDIT: it seems that inside the editor winforms is actually running the duality update loop as well. Updating the render state could be done here as well I think (thus decoupling it from the OnPaint event). EDIT2: ICmpUpdatable seems to be unnaffected which makes sense since its not wired up to OnPaint. EDIT3: Cant seem to reproduce the reentrancy bug with ICmpRenderer either. Using a parallel.for does generate alot of nullreference exceptions in the DrawDevice.DrawBatchComparer method. Just drawing something in a normal for and doing some operation in a parallel for doesnt seem to generate this bug either. EDIT4: Reproduced the flickering but its not due to reentrancy I think. Its due to nullreference exceptions randomly happening when using parallel.for. Now to check why it doesnt happen with that custom sync context.

Barsonax commented 6 years ago

Drawing stuff inside a parallel for is not a supported scenario since drawing stuff is not threadsafe. This also means I could not find a way to reproduce the flickering on the master branch. Even when checking if there was any kind of reentrancy in the draw method with a boolean flag resulted in nothing.

Most likely this bug has been fixed due to changes in the core and the editor.

ilexp commented 6 years ago

Unrelated to previous posts, some insight on how Unity makes sure async await doesn't blow up their thread safety. Remotely related to this for their use of a custom "main thread only" synchronization context:

Not sure if this main thread only business is a great idea, but good for inspiration and learning about synchronization contexts.

ilexp commented 5 years ago

With the recent update from .NET 4.5 to .NET 4.7.2, we now should have this fix from .NET 4.6:

GDI+ reentrant paint operations fixed. [1075715]

Not entirely sure this is what we observed here, but it could be worth checking out if the problem from the initial description is still an issue.

ilexp commented 4 years ago

For research: https://devblogs.microsoft.com/dotnet/configureawait-faq/