Dropped devices leak memory on Windows dx12

kpreid commented 1 year ago

Description If I repeatedly create devices with substantial resources, then eventually I will get a failure to create a device (or, if one of the existing devices is used further, an allocation failure within it) even if the devices are dropped.

(This was discovered inside my application's test suite — I recognize that creating many devices is an irregular situation.)

Repro steps Stick this test case in wgpu/tests/ and run it (branch with needed changes @ https://github.com/gfx-rs/wgpu/compare/master...kpreid:wgpu:winleak):

use crate::common::{initialize_adapter, initialize_device};

use wasm_bindgen_test::wasm_bindgen_test;
use wgpu::*;

#[wasm_bindgen_test]
#[test]
fn allocate_large_3d_textures_in_devices() {
    let (adapter, _surface_guard) = initialize_adapter();

    for i in 0..100 {
        let (device, _queue) = pollster::block_on(initialize_device(
            &adapter,
            Features::empty(),
            Limits::default(),
        ));

        println!("{i}");
        // Each one of these textures occupies at least 256*256*256*4 ≈ 67MB.
        let _texture = device.create_texture(&TextureDescriptor {
            label: None,
            size: Extent3d {
                width: 256,
                height: 256,
                depth_or_array_layers: 256,
            },
            mip_level_count: 1,
            sample_count: 1,
            dimension: wgpu::TextureDimension::D3,
            format: wgpu::TextureFormat::Rgba8UnormSrgb,
            view_formats: &[],
            usage: wgpu::TextureUsages::TEXTURE_BINDING | wgpu::TextureUsages::COPY_DST,
        });
    }
}

The leak does not (visibly) occur if the same textures are created and dropped while using a single device.

Platform

wgpu 0.15 or git master (b534379acbd16d427e917b382d19d2e16000d829).
I believe this did not happen with wgpu 0.14.
GitHub Actions Windows runner

Adapter 0:
       Backend: Dx12
          Name: "Microsoft Basic Render Driver"
      VendorID: 5140
      DeviceID: 140

ErichDonGubler commented 1 year ago

Hmm, both the driver and nature of issue (memory leak, maybe refcounting issue?) Seem like they might be related to #3485?

AdrianEddy commented 1 year ago

I've also seen similar issues on Windows. I didn't have time to investigate them yet, so I don't have any details, but definitely something like that happens.

Elabajaba commented 1 year ago

Disabling dx12 suballocation makes the test pass on my system.

--- a/wgpu/Cargo.toml
+++ b/wgpu/Cargo.toml
@@ -142,7 +142,7 @@ hal = { workspace = true }
 hal = { workspace = true, features = ["renderdoc"] }

 [target.'cfg(windows)'.dependencies]
-hal = { workspace = true, features = ["dxc_shader_compiler", "renderdoc", "windows_rs"] }
+hal = { workspace = true, features = ["dxc_shader_compiler", "renderdoc"] }

If you have a non-test reproduction, gpu-allocator should be outputting warn level logs if it's leaking memory when we're dropping it.

Elabajaba commented 1 year ago

I turned your test code into an example and shoved a println! into Device::exit(...) (in wgpu-hal/src/dx12/device.rs), and it looks like Device::exit doesn't get called until all the loop iterations complete (Vulkan seems to have the same issue, but for whatever reason doesn't end up with the same kind of memory bloat).

edit: More testing, device::exit() seems to only run if the instance creation is in the loop. Note that with instance out of the loop (like in the OP), device::exit(...) runs for every created device in the loop after the loop is completed and the instance is dropped.

Elabajaba commented 1 year ago

Some more context:

The texture seems to be freed properly (println debugging shows destroy_texture() being called), and it leaks even without the texture.

The issue is that when you create a new gpu-allocator allocator (which happens when we create a new device) it allocates a block of memory that it uses to suballocate smaller blocks from for buffers and textures, and there's a bug somewhere that's stopping devices from being dropped until the Instance is dropped. (sidenote, I'd almost consider Vulkan not leaking memory here a bug, as I thought our gpu-alloc implementation worked similarly to gpu-allocator with regards to creating and holding onto large ~256MB blocks of memory to suballocate from)

kpreid commented 1 year ago

The issue is that when you create a new gpu-allocator allocator (which happens when we create a new device) it allocates a block of memory that it uses to suballocate smaller blocks from for buffers and textures, and there's a bug somewhere that's stopping devices from being dropped until the Instance is dropped.

Ironically, the reasoning behind my original code creating many Devices was “Well, a Device is the largest stateful-in-the-API unit, so creating a new one per test case is probably the best tradeoff between test isolation and performance”. If that's in fact not a good option (perhaps for reasons beyond this particular delayed dealloc bug) then it would be nice if there were clues in the API documentation.

jimblandy commented 9 months ago

(This was discovered inside my application's test suite — I recognize that creating many devices is an irregular situation.)

Firefox needs to be able to create endless numbers of devices without leaking, because device creation is under web content's control. We could decouple WebGPU API Devices from wgpu Devices, but it seems just as easy to fix wgpu, and that'd help a broader audience.

Ironically, the reasoning behind my original code creating many Devices was “Well, a Device is the largest stateful-in-the-API unit, so creating a new one per test case is probably the best tradeoff between test isolation and performance”. If that's in fact not a good option (perhaps for reasons beyond this particular delayed dealloc bug) then it would be nice if there were clues in the API documentation.

Creating separate devices for isolation seems like something that ought to work.

cwfitzgerald commented 9 months ago

Tests for memory leaks could potentially use https://crates.io/crates/sysinfo to get such information, but it would not help the case where we're leaking gpu memory.

jimblandy commented 9 months ago

Are the applications calling poll properly? It seems like dx12's Device::exit gets called in a pretty straightforward way if Global::exit_device is ever called at all. That should always happen if the device is genuinely unreferenced and we're calling poll_devices.

kpreid commented 9 months ago

Are the applications calling poll properly?

In the situation which prompted me to file this issue, they were — each device belonged to a headless render test, so it would poll to retrieve the rendered image. Of course, the test doesn't poll after it completes, and won't poll if it somehow panics before getting to that point.

If all that is needed to avoid a leak is to poll after the resources are garbage, maybe wgpu should automatically poll just before creating a device?

Elabajaba commented 5 months ago

This seems to be fixed on wgpu 0.19.

kpreid commented 4 weeks ago

I have just learned that the WebGPU specification says that you can only requestDevice() once per Adapter — you're supposed to request a fresh Adapter to request another device. (See #5764 for more.) So, the test case I filed this issue with is incorrect — it should fail when it tries to use the same Adapter a second time. This issue might therefore be entirely moot, or not; I don't know enough about wgpu internals to say.

teoxoy commented 3 weeks ago

Right, D3D12 will also give you back the same device from the same adapter (they are singletons per adapter).

kpreid commented 3 weeks ago

@teoxoy I think you misunderstand. I'm saying that, according to the WebGPU specification, calling requestDevice() more than once on a given Adapter value should return an error. An Adapter value isn't in a 1:1 relationship with the hardware; it's a handle to the opportunity to create one Device value. Therefore, the question of what happens when you have multiple Devices created from one Adapter is moot — that's prohibited. But none of that has anything to do with the cardinality of the underlying driver/platform-API adapter and device entities.

So, the original test code in principle ought to be rewritten to repeatedly call initialize_adapter, not just initialize_device, to make it correct. This API-usage fix is independent of whether or not there's (still) a memory leak, unless it happens that the leak only happens in this case, in which case the bug can be fixed simply by making wgpu spec-conformant.

teoxoy commented 3 weeks ago

I agree we should implement the validation, I was trying to provide relevant info as to why we are leaking without having the validation implemented. After all, this validation is in the spec most likely because of D3D12's behavior.

gfx-rs / wgpu

Dropped devices leak memory on Windows dx12 #3498