Open guusw opened 5 months ago
FWIW, I noticed 2x Vulkan performance decrease but not related to wgpu, but after updating the NVIDIA driver from 535 to 550. Is your GPU driver the same?
516.94, will try with the most recent one.
Might also be worth trying against the changes on #5681 We've been meaning to put this in a patch release because there's all sort of problems prior to that PR
I've had similar performance problems with Vulkan and v0.20.0.
After upgrading, I noticed mouse input being very unresponsive in my engine. The NVIDIA overlay reported a render latency of ~120ms at 144FPS (Fifo). The frametimes were similar to v0.19.4 (latency was ~11ms). A similarly high latency occurred at higher framerates with VSync off (Immediate). Initially, I suspected my own engine code, but the same problem occurred when I tested the official wgpu examples. The ash examples did not display this behavior. I'm not entirely sure how NVIDIA exactly measures "render latency", although it seems to be something like "time from render queue to actual presentation".
Upgrading my driver to 555.85 did not help.
I suspect this has something to do with the recent synchronization changes in #5681.
I suspect this has something to do with the recent synchronization changes in https://github.com/gfx-rs/wgpu/pull/5681.
any chance you could check against latest trunk? We want to release a patch with that PR soon
I also ran into this problem after updating to v0.20. It doesn't seem like the framerate has decreased, but the latency increased a lot in Fifo mode. I also use wgpuBufferMapAsync
but never block on it.
I use an Nvidia RTX 470 with driver version 550.78, on Linux, winit 0.30, whole desktop running at 60fps.
I did update to the latest trunk as suggested, but unfortunately that did not fix the issue.
Noticing latency manually is a bit error prone so I did the following: just print to the terminal whenever an button press is detected. The button press leads to camera motion and thus to a notable change in the rendered image. I then screen captured terminal + wgpu window and finally counted the frames between "terminal prints event" and "effect visible in rendered image". I know this is far from a perfect measurement of latency, but the trend is clear: before the wgpu 0.20 update, there were 2 frames delay, with wgpu 0.20 it's 6 frames, with c7458638d14921c7562e4197ddeefa17be413587 it's also 6.
In Immediate
present mode, there doesn't seem to be a problem, I think.
I suspect this has something to do with the recent synchronization changes in #5681.
any chance you could check against latest trunk? We want to release a patch with that PR soon
I just checked against the latest trunk and it does seem a bit better. With VSync, the latency is reduced from ~120ms to ~30ms and it's noticeably more responsive. However it's still not as good as v0.19.4 (11-15 ms). I think the problem still applies to Immediate present mode, but the impact is generally less noticeable since the frametimes and latencies are lower (for simple scenes).
We would definitely like to address this, but so far in this issue it's hard to discern how we can actually debug this. A reduced test case that we can run locally to see the problem ourselves would make progress a lot easier.
Just chiming in to say that I have noticed this regression when working on the upgrade in iced
, even after #5681—and it is one of the main reasons the upgrade has not been merged yet (https://github.com/iced-rs/iced/pull/2417).
However, it seems I can only reproduce the regression in debug
mode. release
builds seem to both have similar performance. Talking about render time, specifically. Not latency.
@guusw @vntec did you check on release mode? For Debug we'd only care about regressions if they make things completely unusable for a wide range of trivial usecases - one can always configure cargo.toml such that wgpu is compiled with optimizations even when the rest of the application is not. Compiling wgpu without optimizations is only truly relevant when trying to find issues in wgpu(-core/-hal) itself
@Wumpf Which packages should I exactly choose to optimize?
I am adding opt-level = 3
to the wgpu
crates in 0.20
but still struggling to obtain the same render time as with no optimizations whatsoever in 0.19
. That seems a bit odd to me.
you might also need to add opt-level = 3
for wgpu-core
and wgpu-hal
, don't think that setting is transitive
@Wumpf Ah! Disabling debug-assertions
did the trick!
[profile.dev.package.wgpu]
debug-assertions = false
[profile.dev.package.wgpu-hal]
debug-assertions = false
[profile.dev.package.wgpu-core]
debug-assertions = false
[profile.dev.package.wgpu-types]
debug-assertions = false
opt-level
does not seem to have much of an impact. Maybe 0.20
introduced a lot more debug_assert!
in some hotpath?
I narrowed it to wgpu-types
:
[profile.dev.package.wgpu-types]
debug-assertions = false
This has the biggest impact on performance in debug
builds.
EDIT: More experimenting indicates that both 0.19
and 0.20
seem to have same debug
render time with debug-assertions
disabled.
Disabling debug-assertions
for 0.19
has little impact, while 0.20
seems to have more debug checks.
nice! thank you for digging into this!!
I'm making the bold move of closing this as won't fix then, if it's just extra debug assertions we're hitting here I'm not concerned about perf :)
@ everyone else: Please re-open if there's reason to believe there's regressions beyond extra debug assertions or it can be shown that certain debug assertions are too excessive.
Thanks for the investigation, everyone! @Wumpf Yes, I was running in release mode, but I'll check if disabling debug assertions does the trick.
EDIT: My render latency problem is gone now. It was probably the synchronization changes interacting with a specific driver version.
@Wumpf could we reopen this? Bevy users are running into it while running their game in debug mode per the linked issue. Usually I would agree that performance is a minor concern for debug assertions, but the change in FPS reported seems staggering: 60 FPS vs 460 FPS for one user and 20 FPS vs 60 FPS for another
I investigated a bit more and traced the issue down all the way to InstanceFlags::from_build_config
:
Specifically, it seems the InstanceFlags::VALIDATION
flag enabled in InstanceFlags::debugging
is the culprit:
Thus, the issue can be circumvented downstream by manually passing InstanceFlags::empty()
to wgpu::Instance::new
—at the expense of validation in debug builds.
Any ideas why validation is way more expensive since 0.20
?
I can't check on my phone, but I think that's about when we enabled synchronization validation by default.
Can confirm I get ~300 FPS more in the cube
example if I comment these lines (enabling PresentMode::Immediate
):
In any case, I have circumvented the problem in iced
by introducing a strict-assertions
feature flag that can be enabled for internal development; since I see no reason for users of iced
to run validation layers unless they are trying to debug a core issue.
This results in an actual speedup on debug builds, after all! Therefore, this issue can be closed on my end.
Hmm if it can have such a big impact maybe we shouldn't enable this on all debug builds after all
We could have two levels of validation InstanceFlags::VALIDATION
and InstanceFlags::VALIDATION_SLOW
and only enable the former in debug builds if at all 🤔
We already do lol, we only turn on gpu based validation if you ask for "advanced_debugging".
Syncval is a lot more useful to us that GBV, which is why we enabled it.
well but synchronization validation doesn't fall under gpu based validation, right? So if we were to not enable that by default we'd have to split it out of regular validation. And judging from this ticket we really should strongly consider not to enable this by default.
Hey, I recently tried upgrading from 0.19.1 to 0.20 and noticed I get half the performance using vulkan as I did previously.
I did some profiling and noticed all application code is the same except for one vkQueueSubmit taking up 11ms I suspect something is waiting for an operation to complete which did not happen previously.
It's worth nothing that I perform a readback from a texture, although it happens asynchronously using
wgpuBufferMapAsync
.Below I have provided two renderdoc captures (before/after the upgrade) The rendered items are identical, the only difference I could notice is some fences at the start of the 0.19 capture
Let me know if I can provide any other information to help.
renderdoc.7z.zip