Open myth0genesis opened 5 days ago
My suspicion is that this is related to the Vulkan backend. I've built and run some of the Vulkano examples from their most recent release and noticed the top offenders also seem to be Mutexes and printf
s similar to what the perf
report in wgpu 0.20.0 and later show.
Very likely the same as
Can you try what happens when you disable validation? That's automatically the case in Release, for Debug it's opt out
I've already tried the fixes suggested in those issues, which is what prompted me to start this new one. I may have bungled the disabling validation bit because I'm not sure if that's the correct way to do it. I tried running the example setting the environment variable, as well as following the instructions here, where I commented out the relevant lines in wgpu/wgpu-hal/src/vulkan/instance.rs. Running in Release still has the same high CPU usage (though sometimes it's slightly lower than without WGPU_VALIDATION=0
and the perf
report shows the same top offenders. But either way, the CPU usage, which is sometimes at 75% and sometimes at 100%, is still orders of magnitude higher than older versions of wgpu. Attached is a video showing the CPU usage when running in Release and turning off validation via the environment variable.
thanks for follow-up! Bit of relevant context that I know of: from 0.19 to 0.20 there was a bunch of fixed that landed to how synchronization is done on Vulkan - in fact it was pretty bugged before. That also matches up well with the perf logs you attached:
9.57% wgpu-examples libc.so.6 [.] pthread_mutex_lock@@GLIBC_2.2.5
8.79% wgpu-examples libc.so.6 [.] pthread_mutex_unlock@@GLIBC_2.2.5
Looks like the internal spinning optimization of libc is now hit hard 🤔 (afaik libc first spins a bit before doing the syscalls to yield to the scheduler) Bunch of next investigation steps I can think of:
I appreciate the quick response. I don't know for sure if it's something exclusive to wgpu. It's not an apples-to-apples comparison, as I don't know enough about Vulkan to understand how frame pacing works in any meaningful detail and the examples are obviously not the same, but I ran the triangle
example from the most recent release of Vulkano and there was high CPU usage there, too. I've attached the first page of the perf
report here and you might be interested to see the list of top offenders looks very familiar.
Perf_Report_Vulkano.txt
Okay. Scratch that last comment. I no longer think it's to do with the Rust Vulkan bindings. I should've looked beforehand, but I just today learned wgpu uses the Ash Vulkan bindings. So I ran the triangle
example in the version of Ash that was first present in wgpu 0.20.0, 0.37.1
, and no high CPU usage was observed. Attached is a video showing the results:
Description There seems to be very high single-core CPU usage in versions of wgpu 0.20.0 and later.
Repro steps
skybox
) in any version of wgpu before 0.20.0 and observe CPU usage.Expected vs observed behavior Attached is a video where I run the
skybox
example provided in the wgpu repo first with version 0.19.4, and then I run the same example again with wgpu 0.20.2, and then I run them both in the same order once again while keeping a CPU monitor open to observe the effects on CPU usage. Single-core CPU usage spikes to at or near 100% with wgpu 0.20.2 (and later). However, CPU usage with the 'skybox' example for wgpu 0.19.4 and earlier are at near idle levels.Extra materials The video I mention is attached below, as well as the first page of the
perf
reports for theskybox
example for both wgpu 0.19.4 and wgpu 0.20.2 in human-readable plaintext format. High_CPU_wgpu.webm Perf_Report_wgpu-0.19.4.txt Perf_Report_wgpu-0.20.2.txtPlatform Operating System: Kubuntu 24.04 KDE Plasma Version: 5.27.11 KDE Frameworks Version: 5.115.0 Qt Version: 5.15.13 Kernel Version: 6.8.0-45-generic (64-bit) Graphics Platform: X11 Processors: 12 × 12th Gen Intel® Core™ i9-12900H Memory: 31.1 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2 System Version: REV:1.0