Closed SRSaunders closed 1 month ago
Can a double-buffered V-Sync option be added to minimize input lag? Depending on your settings, you may be able to reach the maximum FPS at all times. In this situation, there's not much reason to use triple-buffered V-Sync.
The short answer is yes, the capability is already there. There are two mechanisms to deliver double-buffering (or equivalent latency):
NUM_FRAME_DATA = 2
(in precompiled.h) swapChainBufferCount = 2
(in DeviceManager.h)NUM_FRAME_DATA = 3
swapChainBufferCount = 3
NUM_FRAME_DATA = 3
swapChainBufferCount = 3
r_maxFrameLatency = 2
(the default) to achieve a latency of about 36 msec @ 60Hz, or r_maxFrameLatency = 1
for a latency of 20-24 msec @ 60 Hz. However, setting r_maxFrameLatency = 1 eliminates CPU/GPU parallelism like in option 1 above and the ability to handle processing spikes without FPS drops will be reduced. However, at least for my setup, this is better than option 1 above since I can maintain higher FPS rates for equivalent latency settings. I am recommending a setting of r_maxFrameLatency = 2
for a balanced approach to performance and latency. Your mileage may vary based on the strength of your GPU and display refresh rate.If you want to experiment, I suggest you try out this PR with PresentMon and provide feedback on your results. I would be very interested in how this works for other setups. Alternatively, you could try out the Intel sample application in link 1 of my PR write-up above. I found this interactive latency demo very helpful in understanding the tradeoffs.
This PR attempts to reduce frame latency for Windows DX12 by leveraging DXGI's waitable swap chain option. Note this is not applicable to Vulkan. I found some interesting articles and sample code on the subject:
This concept was very easy to implement for RBDoom3BFG and leads to some pretty good results on my AMD 6600XT card driving a 60 HZ monitor, with VSync set on:
The above latency numbers were observed in windowed-mode using PresentMon in conjunction with the Optick profiler improvements from my previous pull request #780. Borderless fullscreen mode gave similar results.
The tradeoff for lower latency is reduced GPU to CPU overlap and FPS throughput. However, if using a powerful GPU this may not be very noticeable. Even with my relatively low-powered 6600 XT, I can easily drive the game in the headquarters hallway scene at 60Hz with the DXGI waitable object set to 1 or 2 frames of latency. However, I am not sure how things would perform with only 1 frame of latency (effectively no CPU/GPU parallelism) during heavy action sequences. For this reason I am recommending a frame latency of 2 to achieve some CPU/GPU parallelism, mirroring the recommendations in article 1 above. In my experiments with RBDoom3BFG, I still found it important to keep NUM_FRAME_DATA = 3 for best performance, coupled with the waitable swap chain set to 2 frames of latency.
I have defined a new cvar, r_maxFrameLatency with default value of 2, to allow simple changes and experimentation. Permitted values are 0, 1, 2, and 3. The value of 0 turns off the feature. Values 1 and 2 are useful and correspond to the number of queued back buffers permitted in the swap chain. A value of 3 means using the full set of 3 swap chain buffers (same latency as off). Note this cvar cannot be changed on the fly (CVAR_INIT), and must be set up when the swap chain is first initialized. To change it you must use the autoexec.cfg file or enter
seta r_maxFrameLatency <x>
in the console and restart the game.Here are a couple of Optick screen grabs to show the results. You can see that the VSync/Present queue is now labeled with the FrameID, allowing direct inspection of latency:
Capture 1 showing ~52 msec of latency (60Hz, vsync on, r_maxFrameLatency = 0, same as current app):
Capture 2 showing ~36 msec of latency (60Hz, vsync on, r_maxFrameLatency = 2):
Capture 3 showing ~24 msec of latency (60Hz, vsync on, r_maxFrameLatency = 1):
Capture 4 showing the difference between the swapchain waitable object or "DX12_Sync1", and GPU triple buffering sync or "DX12_Sync3" (60 Hz, vsync on, r_maxFrameLatency = 3). The "DX12_Sync1" point aligns with the start of a new present frame, and "DX12_Sync3" aligns with completion of the N-2 frame's GPU frame. Depending on the settings and game circumstance, both may be observable as seen below: