RobertBeckebans / RBDOOM-3-BFG

Doom 3 BFG Edition source port with updated DX12 / Vulkan renderer and modern game engine features
https://www.moddb.com/mods/rbdoom-3-bfg
GNU General Public License v3.0
1.37k stars 244 forks source link

Vulkan & Optick Improvements and GPU memory + CPU/GPU usage % features #854

Closed SRSaunders closed 3 months ago

SRSaunders commented 3 months ago

This PR replaces the non-performance part of #818, which will be closed and not merged. It has no dependencies on nvrhi changes, but would benefit from https://github.com/RobertBeckebans/nvrhi/pull/7 for timer accuracy on macOS for non-Apple Silicon GPUs.

This solves #804, and the Apple Silicon artifact elimination portion of #763 (disable GPU skinning on Apple Silicon).

Details as follows:

  1. Vulkan: Simplified Vulkan code by removing barrier command list - not needed if submit done in correct order.
  2. Vulkan: Made the VMA header file visible within the IDE source tree under libs/vma (CMakeLists change)
  3. Vulkan: Use dynamic function pointers vs static functions for configuring VMA, Optick, and MoltenVK
  4. Optick: Added support for configuring with Vulkan dynamic functions vs. statically-linked functions
  5. Optick: Added support for reporting runtime errors with text descriptions (extends existing infra)
  6. All Platforms: Added CPU and GPU usage % counters (with filtering) to the on-screen HUD display.
  7. All Platforms: Added GPU Memory usage to the on-screen HUD display (DX12 & Vulkan).
  8. Intel iGPUs: Works around missing Vulkan shaderStorageImageReadWithoutFormat device feature on Intel GPUs, and individually activates VK_KHR_fragment_shading_rate sub-features vs. all or none (supported by nvrhi).
  9. macOS: Added MoltenVK's Vulkan-to-Metal encoding time to the HUD when available for macOS only.
  10. macOS: Made a minor CMakeLists fix primarily for Xcode that cleans up precompiled.h-xxxxx.gch.tmp files left around when the ZERO_CHECK target runs for regeneration.
  11. macOS: Disabled GPU Skinning for macOS arm64 to eliminate rendering artifacts (campaign and multiplayer).
  12. macOS: Modified cmake-macos-*.sh and cmake-xcode-*.sh build scripts for openal-soft path portability across x86 and Apple Silicon. Thanks to @asemarafa for the code.
  13. macOS: Added support for VK_KHR_synchronization2 extension with MoltenVK 1.2.6 / Vulkan SDK 1.3.268.1 and later
  14. macOS: Added support for the new VK_EXT_layer_settings extension used for configuring MoltenVK 1.2.7 / Vulkan SDK 1.3.275.0 and later (now updated to support all build types - debug and release)
  15. macOS: Added r_mvkUseMetalArgumentBuffers cvar (default = 2). Turning this off (= 0) may result in slightly higher performance especially on Apple Silicon, but caution is warranted. This should only be adjusted for Vulkan SDK 1.3.275 and later. SDK 1.3.268.1 has a SPIRV-Cross defect which requires this cvar be set to its default value. A side effect is when set to off, it will also fix #824.
RobertBeckebans commented 3 months ago

I removed the CPU/GPU usage from com_showFPS > 1 because it is timing based and that is not what most gamers expect. When it comes to CPU usage it should be how many cores are utilized at what percentage but overall displaying this information just triggers gamers to question why this engine is so badly optimized when the values are not like 80% all the time.

SRSaunders commented 3 months ago

Thanks for merging this.

The Usage % numbers are frame utilization measurements vs. actual CPU or GPU core loading. I realize that may be less conventional, but at least for me, are more useful since it shows what is the limiting factor for a particular scene on a given platform. In some cases it is the CPU frame time that is saturating the frame rate, and for others the GPU saturates and limits frame rates. But I can see how traditional gamers might not look at that positively given the conventional interpretation of these measurements. I just have never understood why looking at multi-core utilization on a modern CPU is very important when a few threads are the limiting factor - i.e. CPU utilization percentages that stay down in the <20% range but don't really tell you anything.

I wonder if renaming these measurements might better convey what they mean, and differentiate them from traditional HUD implementations, for instance would it help to use Frame Busy % vs. Usage %? Note that for Windows and Linux the info is already there in the HUD, and the percentage calculation is essentially (Sync+Idle)/ Total FrameTime. For macOS it's a bit more complicated because of the Vulkan-to-Metal encoding thread which works in parallel to Sync, so CPU-side occupancy will be higher on that platform.