emilk / egui

egui: an easy-to-use immediate mode GUI in Rust that runs on both web and native
https://www.egui.rs/
Apache License 2.0
20.61k stars 1.49k forks source link

200 MB/s wgpu memory leak (DPI scaling/multiple monitors/resizing) #4674

Open samuelmarquis opened 1 week ago

samuelmarquis commented 1 week ago

Describe the bug

Very bad memory leak. Like 200mb/s of memory leaking type of memory leak. While the leak is ongoing, any element of the interface is inaccessible--no updates to checkboxes or text fields or whathaveyou will appear. The mouse cursor will still adjust as it's been hovered over them, and the memory leak accelerates when a repaint is requested (I've included such a line in my example but you can remove it and still get it to happen)

To Reproduce I have three monitors arranged in this configuration image 1 is 2560x1080 with scaling at 100% 2 is 3840x2160 with scaling at 150% 3 is 1920x1080 with scaling at 100%

Getting this to happen is a bit finicky. In my non-minimal example, all that really needs to happen is the window is created in a boundary between two or more monitors, and then I drag it from that boundary anywhere else, and then we start leaking about 200mb/s of memory. Very bad! Sometimes it stops leaking when I drag it fully to a monitor at 100% scaling, sometimes it doesn't.

I've created a minimal example (literally just the app demo with a side panel that updates some values in a struct), but it's significantly less inclined to letting the bug occur. eguimemleak.zip Sometimes it's enough to position the window at the boundary between all three monitors and start messing with the dragvalues, and the interface will freeze up and leak memory as long as you continue dragging. This doesn't always work for me, but it works often enough that it may be worth examining, as the code is far more tractable. Uncommenting the repaint request at the bottom makes the bug more inclined to occur.

Expected behavior No memory leak

Screenshots here is the memory footprint of the glorified app demo contained in the attached zip: image and of the actual project after a roughly equivalent time leaking (maybe 20s): image

Desktop (please complete the following information):

YgorSouza commented 1 week ago

Does this still happen on the current master (which uses wgpu 0.20)?

samuelmarquis commented 1 week ago

cargo is now

[dependencies]
egui = { git = "https://github.com/emilk/egui" }
egui-wgpu = { git = "https://github.com/emilk/egui", default-features = false }
egui-winit = { git = "https://github.com/emilk/egui" }
nalgebra = { version = "0.32.5", features = ["serde-serialize"]}

wgpu = "=0.20.1"

# uses winit, egui-wgpu, and wgpu
eframe = { git = "https://github.com/emilk/egui", default-features = false, features = [
    #    "accesskit",     # Make egui comptaible with screen readers. NOTE: adds a lot of dependencies.
    "default_fonts", # Embed the default egui fonts.
    "wgpu",
    "persistence",   # Enable restoring app state when restarting the app.
] }

image

yes

YgorSouza commented 1 week ago

Is this also reproducible with the wgpu examples? If so, you should let them know as well. If not, then it might be due to how egui-wgpu handles the screen size when the window is between two screens of different sizes.

https://github.com/emilk/egui/blob/00ac5b2015b3ee7ce44b559f5adb98026d459051/crates/egui-wgpu/src/renderer.rs#L112-L129

dwbrite commented 1 week ago

Fellow contributor to @samuelmarquis project, adding some info: The leak can occur even when the window is fully on the screen with 150% scaling, and resizing wasn't necessary to trigger it.

We noticed when removing the side panel, and specifically when removing the entry boxes, the memory leak is much slower. Maybe a red herring?

Anyway, my intuition is that there could be a floating point bug, e.g., if screen_size != desired_screen_size --> recreate buffer. Just a wild guess though - that's probably not how things work, but 🤷

samuelmarquis commented 1 week ago

Is this also reproducible with the wgpu examples?

just tested all of them, no

emilk commented 1 week ago

Try using a tool that tracks memory allocations, such as re_memory, which gives you a callstack to what piece of code allocated the leaking memory.

samuelmarquis commented 1 week ago

output produced with:

let m = re_memory::MemoryUse::capture();
        if m.used().unwrap() > 40000000 {
            let mut sum = 0;
            if let Some(stats) = re_memory::accounting_allocator::tracking_stats() {
                for item in stats.top_callstacks {
                    sum += item.extant.size * item.stochastic_rate;
                    println!("size({}) * rate({}) = {} | backtrace: {}", item.extant.size, item.stochastic_rate, item.extant.size * item.stochastic_rate, item.readable_backtrace);
                }
                re_memory::accounting_allocator::set_tracking_callstacks(false);
                println!("sum:({})",sum);
            }
        }

have fun

samuelmarquis commented 1 week ago

oops I ran the above on 0.27.2. here's the output but on master (it's 2,000 lines longer now)

dwbrite commented 1 week ago

I think re_memory is off by a factor of 100 when reporting bytes used. The code block above should trigger at 40,000,000 bytes used (40MB), but actually triggers at ~4GB.

On line 455 of the first log file, we see -> size(201024) * rate(64) = 12865536. Given the above, I'm tempted to believe this is actually 1.2GB used here.

Similarly, the sum at the end of the file shows 37,407,196 "bytes", or more realistically ~3.7GB, which corresponds to what Windows Task manager showed at the time these logs printed.

dwbrite commented 1 week ago

@samuelmarquis suggested this leak may only happen when the window initially spawns partially on multiple displays. There was a good while where we couldn't get this leak to trigger, so it took us a while to get re_memory integrated after it, seemingly randomly, started leaking again. I'm fucked if I know what we did.

PWhiddy commented 3 days ago

@samuelmarquis @dwbrite

not specifically windows + egui, but I did just notice a line in the winit 0.30.1 release notes a patch for an issue which sounds very similar to the circumstances which you're encountering the error in.

On macOS, fix window dragging glitches when dragging across a monitor boundary with different scale factor.

It refers to this commit specifically: https://github.com/rust-windowing/winit/commit/e108fa2fbf41bf00316916f622a9a789315a3ee4

Perhaps there is a similar issue on windows that hasn't been patched yet? Doesn't explain why you weren't able to reproduce in the wgpu examples though.