WPF dirty rectangle glitches in high GPU-load situations

markus-neff-bl commented 3 years ago

In high GPU-load situations, dirty-rect handling in WPF-based applications gets out of sync with window content update
This triggers situations in which e.g. a bigger update of a visual is first clipped to the update rectangle of another overlaid smaller visual / control and just shows in one of the subsequent frames
Preliminary analysis suggests, that this is because of D3D 9Ex with D3DSWAPEFFECT_COPY is used by the WPF rendering engine and partial updates during D3D 9Ex application frame rendering are not properly synchronized with DWM GPU blit copy presentation and DWM update rectangle / presentation history handling
A detailed write-up can be found in the attached document: D3D9Ex_WPF_Update_Rectangle_Glitches.pdf
More detailed information like videos, screenshots, GPUView ETW traces and reproducer applications are available for download here: https://drive.google.com/file/d/1yfLaGcvRUD4rAXRPjfpvTHhaytXAk-KI/view?usp=sharing

markus-neff-bl commented 3 years ago

Updated write-up in PDF because one section was hidden in the original PDF: D3D9Ex_WPF_Update_Rectangle_Glitches.pdf

SamBent commented 3 years ago

Your writeup and samples strongly suggest that the problem lies at the Dx9/DWM level, so there’s nothing WPF can do about it. WPF is using Dx9 correctly – everything renders well when only the WPF app is running. The problems begin when other processes add load to the Dx/DWM/GPU pipeline. This is information that’s not available to WPF, and even if it were, a process can’t be expected to change its rendering strategy based on what’s happening in other processes. It’s the job of the lower-level components – Dx9, DWM, etc. – to compose the graphics requests of all those processes into something the GPU can handle.

markus-neff-bl commented 3 years ago

Hi Sam,

thanks a lot for your comment here. However I was in contact with parts of the D3D development team on this in parallel and summing the longer conversation up, they see this a bit differently - they stated that D3D 9Ex cannot give those guarantees here because with GDI based swap chains/effects the design of the rendering system cannot guarantee the synchronization of the dirty rectangles and the rendered window content. The only way would be to switch to newer swap/flip models that do not have to go through GDI, so there, the guarantee I was expecting is given by the rendering system.

So three questions here:

Can I somehow disable internal usage of dirty rects and enforce full rendering in WPF from the application level (via tested and supported code paths)?
Can I somehow trigger usage of a different swap effect like D3DSWAPEFFECT_DISCARD?
Are there any plans to move WPF to a more modern swap chain or even to base it on direct composition in the future?

SamBent commented 3 years ago

It looks like you can get D3DSWAPEFFECT_DISCARD by disabling hardware acceleration, but that's "for debugging and test purposes" and probably isn't helpful to you.

There are no public switches or settings pertaining to dirty rects or swap effects, and I don't see any clever way to trick the graphics stack to do these things differently. Sorry.

The subject of moving WPF to a better graphics underpinning comes up regularly. It's a major project (as I'm sure you appreciate). It's not on our roadmap now; whether it will be in the future is beyond my crystal-ball-gazing ability.

markus-neff-bl commented 3 years ago

Sam, thanks a lot for the response. One further follow up question: would it be possible to add an officially supported way for an application to optionally configure WPF in a way that it does always present the full window instead of just the dirty rects? What do you think - would that be possible? By that an application at least has a way to work around the issue. Or would have any alternative proposals for a workaround on application level?

SamBent commented 3 years ago

Thinking out loud:

Adding a property to Application (or any built-in class) feels wrong, because it doesn't address the inherent behavior of the class, but rather an implementation detail, namely the shortcomings of D3D 9Ex. And it wouldn't do you any good, as new API goes into new releases only, not into servicing updates; you want a .NET 4.8 patch, but the alleged property would only be in .NET 6.0 at best.

A reg key, like the HW acceleration disable key, feels like overkill. That would apply at the machine or user level, making all WPF apps pay for the sake of the one app that needs it. Even the one app only needs the workaround when other apps are stressing the GPU (if I read you correctly), so it's not at all clear how to decide when to ask for "full present mode", and who makes the decision. Perhaps it's OK for your app to ask for it always, on the principle that it's usually going to be in high-GPU territory.

An app-context switch feels possible. Your app would set it, but other apps can remain ignorant. There must be a price for this - increased bandwidth on the GPU channel, more painting, etc. I don't have a feel for how much it is (do you?), but it's probably something that apps opt into only if they know what they're doing and really need to, because the alternative is worse.

We'd need to get the switch state transferred from managed code (which knows about app-context switches) to native code (the graphics stack, where the behavior change happens). I'm assuming it's possible to do this without new API that would kill the deal for servicing, but I don't know off the top of my head.

For implementation, I expect it's a matter of changing the arguments sent to D3D (or other low-level graphics) methods in a few places, and maybe diverting logic that optimizes dirty rects. It sounds like you've looked at this code much more than I have, so perhaps you can pinpoint the changes. I'd feel uneasy about changing this dynamically (no idea how much state in the graphics pipeline would need adjusting), but I expect setting this mode once before the first render would be good enough for you.

A one-time mode setting, governed by an app-context switch, with fairly lightweight changes in the graphics code, sounds technically do-able in a .NET Framework servicing update. If you want to go that route, work with your CSS rep to get a DTS request filed.

markus-neff-bl commented 3 years ago

Sam,

thanks a lot - sounds very reasonable to me. I agree that some context setting would be better suited here than a direct API.

I do not yet have a good feeling for the runtime overhead of such a change, but from watching in GPUView what D3D11/DXGI does in the comparable situation (seemingly ignoring the dirty rect that I pass into the Present1 call), I had the hope that one could limit the change to just the final Present call on the swap chain. At least with the configuration / code path that is used in the case I look at, this seems OK, as I think that the redirection surface outside of the dirty rect is also assumed to still be valid and up-to-date - even if it was not touched in the rendering of some partial frame.

If I understood the D3D team correctly, the dirty rect as passed into Present should rather a hint for optimization purpose, indicating that the copy can safely be limited to that area. But the D3D system is not forced to adhere to the rect (e.g. because of accumulating dirty rects on D3D / DWM level). If that is true, the application has to guarantee that the whole surface is always intact.

At first glance this is also what the WPF rendering engine code suggests to me - it seems to only touch pixels in the dirty rect. Just for a first test, I hooked into "d3d9.dll!CSwapChain::Present" and omitted the dirty rects, and this seemed to fix the issue without showing any negative side effects.

In that scenario, I assume the overhead to be acceptable - at least in our case - as we anyhow often have to update bigger parts of the window ... a GPU blit of a bit bigger surface area should not hurt that much I hope.

In addition, if really limited just to the code at the actual Present call, I would even assume that it would be OK to dynamically switch on/off this behavior during runtime of the application, as whether presenting only partial or the full content is then completely independent of the rendering steps before that and any state in the graphics pipeline. With that capability, we could then maybe measure frame times or watch out for dropped frames / DWM glitches, because that showed a high correlation with the effect appearing in my analysis with GPUView etc. Whether this would really be worth the additional effort / complexity is not yet clear to me - depends on how big the overhead in the end is. But as statically setting and never changing a dynamic setting is also an option, this could be decided later on. => Could you imaging some dynamic way of passing that in to the rendering engine - maybe even some static setting via an app-context switch that can then additionally be overwritten by some dynamic mechanism?

While I looked at the WPF code a bit, I am far from understanding all the different "paths" depending on different environments and configurations of the rendering pipeline. Having said that, the spot that I had in mind was "CHwDisplayRenderTarget::Present".

Just as a side-comment for the matter of completeness: I assume that the issue also can happen if you have just one GPU-heavy process - e.g. when that process uses D3D11 to render 3D scenes and then uses D3D 9Ex interop via WPF D3DImage or even with a mem copy and a WPF WritableBitmap. In general, I consider missing synchronization guarantees of the dirty rects with the GPU rendering pipeline and some "pressure" on the GPU not as some total niche issue - in my reproducer I intentionally overload the GPU, but in our real applications, we are not putting unreasonable load to the GPU - it is just a bit more dated GPU.

arodland commented 1 year ago

FWIW, this has a huge impact for people trying to run WPF applications on Wine: emulating D3DSWAPEFFECT_COPY on top of Vulkan or OpenGL requires a workaround that's incredibly expensive, worse than using DISCARD and doing full draws. (Also, maybe on Windows itself as GPU vendors are dropping native D3D9 support: does D3D9On12 handle this cleanly?)

Getting WPF off of D3D9 would of course be nice, and I understand exactly how involved that would be, but having a way to do full presents without disabling hardware accel would be quite valuable on its own.

Looking at it from the angle of supporting WPF apps on emulated-D3D9 changes the configuration story, though. An in-app setting isn't very useful if you want existing apps (which may be unmaintained) to work properly. A registry key would be completely appropriate, because the inability to deal with partial presents is a system-wide property in that case.

dotnet / wpf

WPF dirty rectangle glitches in high GPU-load situations #4480