Add support for exclusive fullscreen to the D3D11 driver to reduce input latency

Popax21 commented 1 year ago

As part of the Everest Core project, the Celeste community has decided to collectively move mods over to .NET 7.0, which includes switching away from XNA+FNA to supporting only FNA. However, this raised concerns by the speedrunning scene, as FNA historically used to have worse input latency than XNA. This resulted in a series of FNA3D patches aimed at mitigating the input latency, which I've decided to PR back upstream in case they are useful to a more general audience.

Originally, we tried simply changing the present mode from DXGI_SWAP_EFFECT_DISCARD to DXGI_SWAP_EFFECT_FLIP_DISCARD in order to take advantage of direct flip presentation to remove compositor latency. However, this turned out to not work consistently; on some machines, direct flip simply didn't kick in. As such we added support for exclusive fullscreen to FNA3D, as it consistently produces the least input latency possible (note that flip presentation is still used to minimize latency for e.g. windowed applications).

Note that I've marked this PR as a draft as I am unsure if exclusive fullscreen is something upstream FNA3D even wants to support, considering it is considered obsolete on modern systems - feel free to close this PR if this is the case. On the other hand, this PR addresses an issue which might be a concern for a lot of developers, which is why I decided to PR it in case the maintainers decide that including the patch upstream would be beneficial.

flibitijibibo commented 1 year ago

We'll probably stay clear of exclusive fullscreen, but I'd definitely like to know of other ways to reduce latency. I'm actually surprised this has any effect on Windows since recent releases started faking exclusive fullscreen anyhow: https://devblogs.microsoft.com/directx/demystifying-full-screen-optimizations/

At least on Wayland I've found that using Vulkan and enabling mailbox vsync produces low latency, not sure what Windows has available. There may be ways to optimize Vulkan presentation to get the best result without having to change the windowing behavior.

Popax21 commented 1 year ago

We'll probably stay clear of exclusive fullscreen, but I'd definitely like to know of other ways to reduce latency. I'm actually surprised this has any effect on Windows since recent releases started faking exclusive fullscreen anyhow: https://devblogs.microsoft.com/directx/demystifying-full-screen-optimizations/

At least on Wayland I've found that using Vulkan and enabling mailbox vsync produces low latency, not sure what Windows has available. There may be ways to optimize Vulkan presentation to get the best result without having to change the windowing behavior.

I assume that setting the window fullscreen state provides another avenue for triggering flip presentation, as said earlier simply changing the presentation mode didn't work for all of our testers. In addition to that, Vulkan seems to have a comparable input latency with both vsync on and off to unmodified FNA (so noticeably worse than XNA, at least according to our testing) - I assume it still has the compositor somewhere in the chain.

EDIT: I just noticed that you were talking about Linux systems in the second paragraph, my bad. This matches my experiments / research as well - Wayland should "just work" (it might be possible to use the tearing protocol for users who want that), but on Xorg, I don't see any improvements (in fact Vulkan just runs worse there). One possibility worth exploring would be using the VK_KHR_display extension to bypass the compositor, however I am unsure if that would work on Windows / Linux. As for the D3D11 driver, the only option which seems to consistently work on all machines is exclusive fullscreen (even if just triggers FSO in the end, considering we were unable to do so by just changing the presentation mode on all machines)

thatcosmonaut commented 1 year ago

I assume that setting the window fullscreen state provides another avenue for triggering flip presentation, as said earlier simply changing the presentation mode didn't work for all of our testers. In addition to that, Vulkan seems to have a comparable input latency with both vsync on and off to unmodified FNA (so noticeably worse than XNA, at least according to our testing) - I assume it still has the compositor somewhere in the chain.

A while ago someone from the Celeste runner community actually approached us to report the input lag discrepancy between FNA and XNA, and my investigations into that led to this patch: https://github.com/FNA-XNA/FNA/commit/46216d6cd1ff832eaafa2aef96a088b13a474b25

So if Celeste is still using a version of FNA from before this patch, this would explain the input lag and it wouldn't necessarily have anything to do with the presentation strategy. If you want to test you could just build FNA.dll and drop it in and it should just work.

Popax21 commented 1 year ago

I assume that setting the window fullscreen state provides another avenue for triggering flip presentation, as said earlier simply changing the presentation mode didn't work for all of our testers. In addition to that, Vulkan seems to have a comparable input latency with both vsync on and off to unmodified FNA (so noticeably worse than XNA, at least according to our testing) - I assume it still has the compositor somewhere in the chain.

A while ago someone from the Celeste runner community actually approached us to report the input lag discrepancy between FNA and XNA, and my investigations into that led to this patch: FNA-XNA/FNA@46216d6

So if Celeste is still using a version of FNA from before this patch, this would explain the input lag and it wouldn't necessarily have anything to do with the presentation strategy. If you want to test you could just build FNA.dll and drop it in and it should just work.

We did all our tests against a latest upstream FNA3D build (in fact pretty much all Everest builds ship with rather recent builds), and it was still an issue we could reliably narrow down to compositor lag (+an additional XNA quirk which slightly sped up the tick loop, but that's seperate).

Checking against PresentMon, our testers reported increased latency when either direct flip didn't kick in or exclusive fullscreen was absent - in both cases the compositor was still active according to PresentMon.

Popax21 commented 1 year ago

Reviewed the non-fullscreen bits at least - if we can detect support for FLIP_DISCARD I'd be happy to make that the default path, and we can merge that in as an isolated patch.

That's probably worth looking into, if I have time in the near future (unlikely) I might try to code that up.

I'll have to defer to others on the fullscreen optimization issue though; I could swear that taking over the display isn't necessary anymore but I don't know the Windows 10+ compositor as well as I should.

Going of my experience, it basically is a heuristics lottery. On some machines, just using flip presentation works perfectly (i.e. elliminating compositor lag through Direct Flip), on others it works some of the time depending on random other variables, and on some it just refused to work all together. During our testing, only exclusive fullscreen worked reliably on all test machines, even if it just ends up triggering FSO.

One possibility would be to move most of this logic into SDL2_FNAPlatform instead, so that the burden is distributed across renderers - the part that's most scary is the SDL_Window calls, which we generally avoid doing in FNA3D unless it's a Get, or a renderer-specific Set function (mostly OpenGL...). If it was super super isolated to be like, FNA_GRAPHICS_EXCLUSIVE_FULLSCREEN_UNSTABLE then that might be a way to avoid having downstream changes...

Makes sense, those are there to smooth over fullscreen transitions, as otherwise stuff would go haywire. I'm not sure if seperating this from the native D3D11 renderer is a good idea though, considering all of this logic is pretty much D3D11 exclusive and having it absent while messing with the fullscreen state (which is a valid concern if this logic gets moved downstream) breaks everything.

flibitijibibo commented 1 year ago

Looks like all we need to support flip discard is to check for DXGI 1.4 - should be a quick QueryInterface as far as I know, so if anyone wants to fill this in over the weekend I can merge it ASAP - that may even help simplify this patchset should we need to bring it in.

flibitijibibo commented 1 year ago

Turns out this was super easy to check, so I snuck in a few minutes to scribble it in:

https://github.com/FNA-XNA/FNA3D/commit/3fe3c8ce6f0603588fa2b8f2e5f74301c988e024

thatcosmonaut commented 1 year ago

We did all our tests against a latest upstream FNA3D build

Just to clarify, my comment was about FNA specifically, not FNA3D, and to my understanding Celeste never shipped with a build of FNA that included the patch I mentioned.

flibitijibibo commented 1 year ago

The tearing query wasn't too bad either, so: https://github.com/FNA-XNA/FNA3D/commit/7654843cd70905e76ed7cdec749a64730cb2aef4

EDIT: Woops, forgot ResizeBuffers: https://github.com/FNA-XNA/FNA3D/commit/f78bf7738e7f20e056d8c497317f8e651131db38

kg commented 1 year ago

Some notes from testing this new flip mode: If you want to determine whether it's active, you can use PresentMon, but you need to create a Custom preset to show the present mode, like this: by clicking 'Custom' and then 'Edit' in the Preset selector. You also need to run PresentMon in Windowed Mode, which is in the advanced settings under Overlay Configuration, and make sure the window is on a different monitor. If anything is overlapping your fullscreen window, that disables "HW Independent Flip" and bumps you down to "Composed Flip" like windowed mode. And by anything I mean literally anything other than the mouse cursor, even a 1px drop shadow from a window on another monitor. Once you get demoted to Composed Flip, it will take a second or two for HW Independent Flip to re-engage even after you remove whatever was overlapping the window, so be patient when testing.

Popax21 commented 1 year ago

We did all our tests against a latest upstream FNA3D build

Just to clarify, my comment was about FNA specifically, not FNA3D, and to my understanding Celeste never shipped with a build of FNA that included the patch I mentioned.

Oh yeah sorry, we tested against latest for both (Everest ships with its own copy of FNA which replaces the existing one).

Popax21 commented 1 year ago

Some notes from testing this new flip mode: If you want to determine whether it's active, you can use PresentMon, but you need to create a Custom preset to show the present mode, like this: by clicking 'Custom' and then 'Edit' in the Preset selector. You also need to run PresentMon in Windowed Mode, which is in the advanced settings under Overlay Configuration, and make sure the window is on a different monitor. If anything is overlapping your fullscreen window, that disables "HW Independent Flip" and bumps you down to "Composed Flip" like windowed mode. And by anything I mean literally anything other than the mouse cursor, even a 1px drop shadow from a window on another monitor. Once you get demoted to Composed Flip, it will take a second or two for HW Independent Flip to re-engage even after you remove whatever was overlapping the window, so be patient when testing.

We tested with the console app, so that was not an issue for us. And yes, I know that anything being overlayed prevents direct flip; the issue was that we simply never got it to work on some machines even when explicitly looking for this sort of stuff. In the end we went for exclusive fullscreen since it is way less finicky to get to work.

Popax21 commented 1 year ago

(Accidentally pressed the wrong button, please ignore the above .-.)

Anyway, the best option going forward I could think of would be to include the exclusive fullscreen support in some way, but to make it opt in using an environment variable. This way applications where latency is crucial can enable it to always get the lowest latency possible, while others still get the benefits from the flip presentation mode support.

flibitijibibo commented 1 year ago

I believe we've taken care of all the non-fullscreen changes thus far, so this should now be safe to rebase against upstream.

kg commented 1 year ago

(Accidentally pressed the wrong button, please ignore the above .-.)

Anyway, the best option going forward I could think of would be to include the exclusive fullscreen support in some way, but to make it opt in using an environment variable. This way applications where latency is crucial can enable it to always get the lowest latency possible, while others still get the benefits from the flip presentation mode support.

fwiw, when I go fullscreen on Windows using the Vulkan backend, it is exclusive fullscreen (unless mailbox vsync is enabled)

flibitijibibo commented 1 year ago

Possible reference for Vulkan https://github.com/doitsujin/dxvk/pull/3690

FishiaT commented 1 year ago

Is this PR going to get merged anytime soon?

flibitijibibo commented 1 year ago

I don't think so, no - we do want to keep up with the changes but we're still steering clear of exclusive fullscreen for as long as we can - this might change once SDL_gpu is finished and integrated into FNA3D.

FishiaT commented 1 year ago

I see... Is there any particular reasons to avoid implementing exclusive fullscreen support? I mean yes it's obsolete on modern OSes and borderless windowed is generally more convenient when it comes to multiple displays and stuff but that doesn't mean it's not useful for some case-scenarios.

FNA-XNA / FNA3D

Add support for exclusive fullscreen to the D3D11 driver to reduce input latency #180