gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
11.7k stars 865 forks source link

Memory leak (still reachable) in Vulkan validation layer #5478

Open snyball opened 3 months ago

snyball commented 3 months ago

Description A hash table in the Vulkan validation layer grows very large over time.

I'm filing this issue with both wgpu and mesa, as I don't know if the issue is due to how wgpu uses the validation layer, or just inherent to how the mesa Vulkan driver implements its validation layer. Need to do more testing.

Repro steps reproduced using the bunnymark example from ggez with the following patch:

diff --git a/examples/bunnymark.rs b/examples/bunnymark.rs
index 10be12fd..c6e5887a 100644
--- a/examples/bunnymark.rs
+++ b/examples/bunnymark.rs
@@ -180,7 +180,9 @@ fn main() -> GameResult {
         path::PathBuf::from("./resources")
     };

-    let cb = ggez::ContextBuilder::new("bunnymark", "ggez").add_resource_path(resource_dir);
+    let cb = ggez::ContextBuilder::new("bunnymark", "ggez")
+        .add_resource_path(resource_dir)
+        .window_setup(ggez::conf::WindowSetup::default().vsync(false));
     let (mut ctx, event_loop) = cb.build()?;

     let state = GameState::new(&mut ctx)?;

Initially I was unable to reproduce the issue with bunnymark without this patch, because of this issue: https://github.com/swaywm/sway/issues/6263, I recommend the patch anyway because pushing more frames more quicker makes the issue apparent earlier.

For example, in my internal application using wgpu, the table grows to 8.3GB in 7hrs, with vsync enabled at 75Hz. >1GB/hr is definitely unsustainable.

Expected vs observed behavior I'd expect that hash table to not grow so much over time, or be cleared occasionally.

Extra materials ggez bunnymark heaptrack report (no vsync): https://share.moller.systems/heaptrack.bunnymark.710642.zst

(github doesn't let me upload the .zst file here)

image

Platform wgpu 0.18.0

winit 0.28.7

os linux 6.8.2-zen2-1.1-zen

display-server Wayland / wlroots / sway 1.9

gpu AMD Radeon Graphics (radeonsi, raphael_mendocino, LLVM 17.0.6, DRM 3.57, 6.8.2-zen2-1.1-zen)

SludgePhD commented 3 months ago

I've observed the same behavior in the past (on Wayland KDE, not sway). I think https://github.com/KhronosGroup/Vulkan-ValidationLayers/ would be the right place to report it (unless mesa has its own validation layer?).

cwfitzgerald commented 3 months ago

What's the stack of the allocated leaks? The VVLs need to keep all handles around forever so it can catch use-after-frees, and the labels are kept around as well.