gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.3k stars 905 forks source link

Very High CPU Usage with wgpu >= 0.20.0 #6338

Open myth0genesis opened 5 days ago

myth0genesis commented 5 days ago

Description There seems to be very high single-core CPU usage in versions of wgpu 0.20.0 and later.

Repro steps

Expected vs observed behavior Attached is a video where I run the skybox example provided in the wgpu repo first with version 0.19.4, and then I run the same example again with wgpu 0.20.2, and then I run them both in the same order once again while keeping a CPU monitor open to observe the effects on CPU usage. Single-core CPU usage spikes to at or near 100% with wgpu 0.20.2 (and later). However, CPU usage with the 'skybox' example for wgpu 0.19.4 and earlier are at near idle levels.

Extra materials The video I mention is attached below, as well as the first page of the perf reports for the skybox example for both wgpu 0.19.4 and wgpu 0.20.2 in human-readable plaintext format. High_CPU_wgpu.webm Perf_Report_wgpu-0.19.4.txt Perf_Report_wgpu-0.20.2.txt

Platform Operating System: Kubuntu 24.04 KDE Plasma Version: 5.27.11 KDE Frameworks Version: 5.115.0 Qt Version: 5.15.13 Kernel Version: 6.8.0-45-generic (64-bit) Graphics Platform: X11 Processors: 12 × 12th Gen Intel® Core™ i9-12900H Memory: 31.1 GiB of RAM Graphics Processor: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2 System Version: REV:1.0

myth0genesis commented 5 days ago

My suspicion is that this is related to the Vulkan backend. I've built and run some of the Vulkano examples from their most recent release and noticed the top offenders also seem to be Mutexes and printfs similar to what the perf report in wgpu 0.20.0 and later show.

Wumpf commented 5 days ago

Very likely the same as

Can you try what happens when you disable validation? That's automatically the case in Release, for Debug it's opt out

myth0genesis commented 5 days ago

I've already tried the fixes suggested in those issues, which is what prompted me to start this new one. I may have bungled the disabling validation bit because I'm not sure if that's the correct way to do it. I tried running the example setting the environment variable, as well as following the instructions here, where I commented out the relevant lines in wgpu/wgpu-hal/src/vulkan/instance.rs. Running in Release still has the same high CPU usage (though sometimes it's slightly lower than without WGPU_VALIDATION=0 and the perf report shows the same top offenders. But either way, the CPU usage, which is sometimes at 75% and sometimes at 100%, is still orders of magnitude higher than older versions of wgpu. Attached is a video showing the CPU usage when running in Release and turning off validation via the environment variable.

High_CPU_wgpu_No_Validation.webm

Wumpf commented 4 days ago

thanks for follow-up! Bit of relevant context that I know of: from 0.19 to 0.20 there was a bunch of fixed that landed to how synchronization is done on Vulkan - in fact it was pretty bugged before. That also matches up well with the perf logs you attached:

9.57%  wgpu-examples    libc.so.6                          [.] pthread_mutex_lock@@GLIBC_2.2.5
8.79%  wgpu-examples    libc.so.6                          [.] pthread_mutex_unlock@@GLIBC_2.2.5

Looks like the internal spinning optimization of libc is now hit hard 🤔 (afaik libc first spins a bit before doing the syscalls to yield to the scheduler) Bunch of next investigation steps I can think of:

myth0genesis commented 4 days ago

I appreciate the quick response. I don't know for sure if it's something exclusive to wgpu. It's not an apples-to-apples comparison, as I don't know enough about Vulkan to understand how frame pacing works in any meaningful detail and the examples are obviously not the same, but I ran the triangle example from the most recent release of Vulkano and there was high CPU usage there, too. I've attached the first page of the perf report here and you might be interested to see the list of top offenders looks very familiar. Perf_Report_Vulkano.txt

myth0genesis commented 4 days ago

Okay. Scratch that last comment. I no longer think it's to do with the Rust Vulkan bindings. I should've looked beforehand, but I just today learned wgpu uses the Ash Vulkan bindings. So I ran the triangle example in the version of Ash that was first present in wgpu 0.20.0, 0.37.1, and no high CPU usage was observed. Attached is a video showing the results:

CPU_Usage_Ash.webm