RobertBeckebans / RBDOOM-3-BFG

Doom 3 BFG Edition source port with updated DX12 / Vulkan renderer and modern game engine features
https://www.moddb.com/mods/rbdoom-3-bfg
GNU General Public License v3.0
1.37k stars 244 forks source link

Complete Optick instrumentation and align with HUD GPU timers #869

Closed SRSaunders closed 1 month ago

SRSaunders commented 2 months ago

I find the Optick instrumentation quite fascinating, and really helpful for looking at GPU timing and occupancy. Now that GPU timing is on by default and works across all platforms, we should no longer need the CPU-only intrumentation points for the detailed rendering passes. I have replaced these with GPU instrumentation points for all phases and all passes which also mark the CPU side as well. The result is a complete look at where all the time is going during rendering, and differentiates between all rendering phases (SetBuffer, DrawView_3D, PostProcess, DrawView_GUI, and CRTPostProcess). It also shows the Clear and Blit operations when they occur for each rendering phase.

Looking at the result made me go back and complete the timing information for the HUD timers, and make them align with the Optick information. It pointed out some missing info on the HUD side (e.g. ToneMapping, the GUI rendering phase, and CRTPostProcessing) which I have now fixed by either adding a new category (ToneMapping), or by adding to existing timers. The result is that if you now add up the HUD's detailed rendering passes on the GPU side it almost exactly matches the total GPU busy time for the frame (within 50-100 usec due to missing clear and blit time for the main 3D draw phase which are not measured by the detailed HUD timers).

I have not noticed any performance hit by doing this, and Optick instrumentation is disabled anyways on release builds.

Here are some screen caps (fyi, I run the OptickApp inside Parallels when working on macOS - works great, also note these timings were taken on macOS based on current master without push constants optimization - that's why frame timings appear a little slow for this test) ...

Overall View: Screenshot 2024-02-28 at 11 51 45 AM

Zoomed-in GPU-side: Screenshot 2024-02-28 at 11 52 28 AM

Screenshot 2024-02-28 at 11 52 45 AM

Zoomed-in CPU-side: Screenshot 2024-02-28 at 11 53 04 AM

Screenshot 2024-02-28 at 11 53 15 AM

SRSaunders commented 2 months ago

I may try to add one more commit that shows MoltenVK encoding time (macOS only) on the Optick trace but won’t get to it before the weekend.

SRSaunders commented 2 months ago

Well, I have a solution for showing MoltenVK's encoding time but unfortunately it depends on a change to MoltenVK itself. This is because Optick submits its own command buffers and you need to ignore those when getting the latest perf stats from MoltenVK. I may try to submit a feature request to the MoltenVK team but no guarantees. What's here works fine for all platforms as is. If MoltenVK implements my suggestion, I could follow up later with an improvement.

Just to give a taste for the possible future, I captured a screen shot showing the encoding thread enabled by a custom mod to MoltenVK. You can see from the trace below how MoltenVK's Vulkan-to-Metal encoding time is inserted between the completion of CPU rendering and the start of GPU rendering:

Screenshot 2024-03-04 at 11 44 11 PM

SRSaunders commented 1 month ago

Added a few small things:

  1. Improved Optick to not block on sleep wait (1 sec) for clock sync at the start of a Vulkan capture. I have removed the sleep wait and now submit an empty command buffer to wait on application buffer completion before starting Vulkan clock sync at the start of a capture. I have also submitted this change to the upstream optick repo.
  2. Corrected a few variables and functions that should have been typed uint64.
  3. Added Optick frame number tags to DX12 and Vulkan Present()
RobertBeckebans commented 1 month ago

This is really nice to look at inside Optick. I removed the Tone mapping from com_showFPS because it is not that important and added an extra line for the video memory usage.

SRSaunders commented 1 month ago

Thanks for merging. I believe the MoltenVK team will indeed merge my suggestion for their next release cycle (MoltenVK v1.2.9). This means I can eventually submit an enhancement showing MoltenVK's encoding phase on macOS as part of RBDoom3BFG's Optick trace - but this won't be for a few months until that dependency has been released.