Closed SRSaunders closed 3 months ago
The results are awesome but I need quite a lot of testing and time to review this code. I aim for a new public release in December before Christmas.
Thanks @RobertBeckebans - take your time on review. It took me a while to figure this out anyways, plus a bunch of work with the MoltenVK project to diagnose and optimize performance issues. It was my summer/fall background project :)
If you have questions please don't hesitate to post a message or send an email. I will do my best to answer.
Hi @RobertBeckebans. I have pushed my final design update to this branch. Commits https://github.com/RobertBeckebans/RBDOOM-3-BFG/pull/818/commits/2063c72912a76f07f545dddaa806e0e7869d0b1c and https://github.com/RobertBeckebans/RBDOOM-3-BFG/pull/818/commits/c1f712a3d66ee6177c7eca518034c960b8bc97f2 add one small but critical update for performance - fixing the uniforms change detection logic which reduces how often volatile constant buffers are written for the larger renderparm sets. This has a huge effect which is orthogonal to push constants, and has a positive impact across all platforms: Windows, Linux, & macOS. See updated timing info above.
I am not sure why my AMD 6600XT GPU seems to be limited when running Windows/Vulkan, with other OSs surpassing it with the same card. Perhaps something about AMD’s windows drivers or AMD’s 128 byte limit for Vulkan push constants on Windows.
I added a few minor things due to dependencies on previous changes within this PR:
Oops, just realized that ImmediateMode was broken for using debug tools with push constants enabled.
macOS only: I just added support for the new VK_EXT_layer_settings
extension used for configuring the upcoming MoltenVK 1.2.7 / Vulkan SDK 1.3.272.0. See commit https://github.com/RobertBeckebans/RBDOOM-3-BFG/pull/818/commits/f3c65eef3e75ea2846cffd97aff1fd2f9bf92f7f.
@RobertBeckebans The last several commits have been added mostly due to dependencies on earlier changes within this PR. Unless something comes up in your testing, I don't plan any more changes here.
@SRSaunders I tried building the XCode release using this branch but ran into a couple of errors. The following diff fixed the XCode build:
diff --git a/neo/renderer/GuiModel.cpp b/neo/renderer/GuiModel.cpp
index 5db04303..fb4547b5 100644
--- a/neo/renderer/GuiModel.cpp
+++ b/neo/renderer/GuiModel.cpp
@@ -395,9 +395,9 @@ void idGuiModel::EmitImGui( ImDrawData* drawData )
idScreenRect clipRect =
{
static_cast<short>( pcmd->ClipRect.x ),
- io.DisplaySize.y - static_cast<short>( pcmd->ClipRect.w ),
+ static_cast<short>(io.DisplaySize.y - static_cast<short>( pcmd->ClipRect.w )),
static_cast<short>( pcmd->ClipRect.z ),
- io.DisplaySize.y - static_cast<short>( pcmd->ClipRect.y ),
+ static_cast<short>(io.DisplaySize.y - static_cast<short>( pcmd->ClipRect.y )),
0.0f,
1.0f
};
Also noticed some strange artifacts on the weapons (not seeing this elsewhere). See attached screenshot.
I have observed a massive increase in FPS though, so thats great.
@labrnth Thanks for trying out this PR. The compile issue you found was fixed in master with commit ab663a76. Perhaps you synced before this was merged recently.
Regarding the weapons artifacts, are you running on an Intel or Apple Silicon Mac? I thought this problem was only visible on Apple Silicon - where I disabled GPU skinning to fix it. If you are running on Intel that is quite interesting and may indicate a more general problem with skinning. In this case you could try setting r_useGPUSkinning 0
in your console, or adding set r_useGPUSkinning 0
to your autoexec.cfg file. Please let me know your results.
Good news on the performance improvement. That was my main goal for this effort. I trust you applied https://github.com/RobertBeckebans/nvrhi/pull/6 - this is needed for max performance.
@SRSaunders Ya that could be.
Yes I'm running on Apple Silicon (M2 Pro). The r_useGPUSkinning "0" fixed it but only while in campaign not the multiplayer.
Thanks for catching the r_useGPUSkinning
reset in multiplayer. Solved in https://github.com/RobertBeckebans/RBDOOM-3-BFG/pull/818/commits/3b6598b88d71bc443c431a04e0f075410679ea36 (this cvar is not a cheat in any case).
@RobertBeckebans I would like to know what you are planning with this PR. I have a few other improvements for Vulkan that I would like to submit (e.g. use Vulkan dynamic functions vs. static linkage for VMA setup, Optick, and MoltenVK config), but I don't want to keep adding on here, especially if you are not planning to merge in the near term.
I could simplify things by dividing this into two PRs that could be treated independently:
I would then close this large PR and you could look at two new smaller PRs independently.
Would this help? If so would you look at either of the new PRs for your upcoming release?
I can see why it is necessary to split the constant buffer renderparms into smaller push constants but I don't really like the complexity it comes with. I don't see this PR merged with RBDoom 1.6 but rather in 1.7. I also would like to merge this after NVRHI has been updated to the newest version.
Separate the HUD GPU Memory usage & CPU/GPU Usage % features (DX12 and Vulkan), generic Vulkan changes (incl. Intel iGPU fixes, dynamic functions, etc), and macOS/MoltenVK-only stuff.
I also would like to see this separated out into a different branch and then it needs to be merged with the newest NVRHI later.
Thanks @RobertBeckebans for your comments and advice. I will proceed with splitting the PR into two parts and resubmit.
Regarding your comments:
pc.
struct preface. If you know a better solution to this please let me know.The pc. suffix for the renderparms is not really a big deal. I was refering to figuring out which renderparms can be put into subsets for the used shaders and hitting the limits of constant buffers when extending the shaders in the future. You did a good job at sorting those using the excel sheet but this still needs to go through some deeper testing and it is also kind of the opposite I generally wish for the overall renderer design. I like it stupid simple and it might be the case that changing the renderer from a multipass forward design where each geometry is rerendered for each light to a clustered forward+ renderer design can outperform your changes while having a design that still works with the old renderparms array. Having all shadows in a big shadow atlas like we have right now was mandatory for forward+ and it is a feature I want to try to implement this year.
Yes, the sorting work with Excel was the main effort, along with figuring out how to dynamically enable/disable push constants at runtime based on platform capabilities. This branch has been my daily driver for about 4 months now, plus a number of linux and macOS users have tried the PR with success. While I think it's pretty stable as is, getting more play and stress testing would give us more confidence.
Regarding your plans for clustered forward+ rendering, are these the papers that descibe it? forward_plus.pdf 087-096.pdf
If so I am wondering whether my work with push constants can co-exist with these improvements? I presume the main issue would be whether existing shaders would need more parameters (or perhaps new shaders), and then bumping up against push constant platform limits (128 or 256 byte limit on Windows & Linux depending on GPU, a non-issue on macOS). However, if it only involves modifying the lighting shaders (ambient, interaction, and interactionSM*) then the solution should be fairly simple: those shaders already use >256 renderparm bytes and have push constants disabled for their associated binding layout types on Windows and Linux. On macOS they can use push constants since the limit is 4096 bytes on that platform. In other words, adding new renderparms to the three subsets for those specific lighting shader groups would not negate the performance gains from my push constants work. Is this a possiblity?
If you are going to continue to support Linux and macOS, Vulkan cannot be left to fall behind in performance. My work was a small attempt to push things forward (sorry for the pun).
This PR will now be closed unmerged. For existing users please see #854 and #855 for up-to-date replacements.
This PR depends on https://github.com/RobertBeckebans/nvrhi/pull/6
This PR also solves #763 for Apple Silicon performance and rendering artifact elimination, as well as #804 for Intel Integrated GPU support with Vulkan (slow, but works).
This is a fairly large pull request that implements the following:
BINDING_LAYOUT_GBUFFER, BINDING_LAYOUT_GBUFFER_SKINNED, BINDING_LAYOUT_TEXTURE, BINDING_LAYOUT_TEXTURE_SKINNED, BINDING_LAYOUT_WOBBLESKY, BINDING_LAYOUT_SSGI, BINDING_LAYOUT_SSGI_SKINNED, BINDING_LAYOUT_POST_PROCESS
)r_useDX12PushConstants
cvar which is turned off by default. This can optionally be turned on using autoexec.cfg for experimentation.select
does not seem to be supported when compiling for DXIL).cmake-macos-*.sh
andcmake-xcode-*.sh
build scripts for openal-soft path portability across x86 and Apple Silicon. Thanks to @asemarafa for the code.shaderStorageImageReadWithoutFormat
device feature on Intel GPUs, and individually activatesVK_KHR_fragment_shading_rate
sub-features vs. all or none (this is supported by nvrhi).r_useVulkanPushConstants
(default on) which is useful for performance comparisons.Tested on Windows 10 (AMD and Nvidia), Linux Manjaro, and macOS Ventura 13.5
Performance timings for this PR vs. current master, generated using a simple home-made timedemo:
Windows Nvidia System (1070 Ti) DX12: 263 fps before,
255360 fps after (with r_useDX12PushConstants = 0) --> significant improvement Vulkan: 218 fps before,233333 fps after --> significant improvementWindows AMD System (6600 XT) DX12: 295 fps before,
285305 fps after (with r_useDX12PushConstants = 0) --> neutral/positive improvement Vulkan: 150 fps before,155160 fps after --> neutral/positive improvementLinux AMD System (6600 XT) Vulkan: 150 fps before,
210270 fps after --> large improvementmacOS AMD System (6600 XT) Vulkan: 77 fps before,
177245 fps after --> very large improvementmacOS Apple Silicon System (M1 Air) (new) Vulkan: 6 fps before, 85 fps after --> massive improvement
Some possible explanations are as follows:
r_useDX12PushConstants = 0
by default.but still modestimprovement for Vulkan.less, but still in the positive direction for Vulkansignificant for DX12 & Vulkan on Nvidia, but less but so on AMD, at least for my test systems/GPUs.@RobertBeckebans here is the spreadsheet I used to create the binding layout type to shader/renderparm subset mapping: Binding to Shader Mapping v4.xlsx