Open kpreid opened 1 year ago
Hmm, both the driver and nature of issue (memory leak, maybe refcounting issue?) Seem like they might be related to #3485?
I've also seen similar issues on Windows. I didn't have time to investigate them yet, so I don't have any details, but definitely something like that happens.
Disabling dx12 suballocation makes the test pass on my system.
--- a/wgpu/Cargo.toml
+++ b/wgpu/Cargo.toml
@@ -142,7 +142,7 @@ hal = { workspace = true }
hal = { workspace = true, features = ["renderdoc"] }
[target.'cfg(windows)'.dependencies]
-hal = { workspace = true, features = ["dxc_shader_compiler", "renderdoc", "windows_rs"] }
+hal = { workspace = true, features = ["dxc_shader_compiler", "renderdoc"] }
If you have a non-test reproduction, gpu-allocator should be outputting warn
level logs if it's leaking memory when we're dropping it.
I turned your test code into an example and shoved a println! into Device::exit(...)
(in wgpu-hal/src/dx12/device.rs), and it looks like Device::exit
doesn't get called until all the loop iterations complete (Vulkan seems to have the same issue, but for whatever reason doesn't end up with the same kind of memory bloat).
edit: More testing, device::exit() seems to only run if the instance creation is in the loop. Note that with instance out of the loop (like in the OP), device::exit(...)
runs for every created device in the loop after the loop is completed and the instance is dropped.
Some more context:
The texture seems to be freed properly (println debugging shows destroy_texture()
being called), and it leaks even without the texture.
The issue is that when you create a new gpu-allocator
allocator
(which happens when we create a new device) it allocates a block of memory that it uses to suballocate smaller blocks from for buffers and textures, and there's a bug somewhere that's stopping devices from being dropped until the Instance is dropped. (sidenote, I'd almost consider Vulkan not leaking memory here a bug, as I thought our gpu-alloc implementation worked similarly to gpu-allocator with regards to creating and holding onto large ~256MB blocks of memory to suballocate from)
The issue is that when you create a new
gpu-allocator
allocator
(which happens when we create a new device) it allocates a block of memory that it uses to suballocate smaller blocks from for buffers and textures, and there's a bug somewhere that's stopping devices from being dropped until the Instance is dropped.
Ironically, the reasoning behind my original code creating many Device
s was “Well, a Device
is the largest stateful-in-the-API unit, so creating a new one per test case is probably the best tradeoff between test isolation and performance”. If that's in fact not a good option (perhaps for reasons beyond this particular delayed dealloc bug) then it would be nice if there were clues in the API documentation.
(This was discovered inside my application's test suite — I recognize that creating many devices is an irregular situation.)
Firefox needs to be able to create endless numbers of devices without leaking, because device creation is under web content's control. We could decouple WebGPU API Devices from wgpu Devices, but it seems just as easy to fix wgpu, and that'd help a broader audience.
Ironically, the reasoning behind my original code creating many Devices was “Well, a Device is the largest stateful-in-the-API unit, so creating a new one per test case is probably the best tradeoff between test isolation and performance”. If that's in fact not a good option (perhaps for reasons beyond this particular delayed dealloc bug) then it would be nice if there were clues in the API documentation.
Creating separate devices for isolation seems like something that ought to work.
Tests for memory leaks could potentially use https://crates.io/crates/sysinfo to get such information, but it would not help the case where we're leaking gpu memory.
Are the applications calling poll properly? It seems like dx12
's Device::exit
gets called in a pretty straightforward way if Global::exit_device
is ever called at all. That should always happen if the device is genuinely unreferenced and we're calling poll_devices
.
Are the applications calling poll properly?
In the situation which prompted me to file this issue, they were — each device belonged to a headless render test, so it would poll to retrieve the rendered image. Of course, the test doesn't poll after it completes, and won't poll if it somehow panics before getting to that point.
If all that is needed to avoid a leak is to poll after the resources are garbage, maybe wgpu should automatically poll just before creating a device?
This seems to be fixed on wgpu 0.19.
I have just learned that the WebGPU specification says that you can only requestDevice()
once per Adapter
— you're supposed to request a fresh Adapter
to request another device. (See #5764 for more.) So, the test case I filed this issue with is incorrect — it should fail when it tries to use the same Adapter
a second time. This issue might therefore be entirely moot, or not; I don't know enough about wgpu
internals to say.
Right, D3D12 will also give you back the same device from the same adapter (they are singletons per adapter).
@teoxoy I think you misunderstand. I'm saying that, according to the WebGPU specification, calling requestDevice()
more than once on a given Adapter
value should return an error. An Adapter
value isn't in a 1:1 relationship with the hardware; it's a handle to the opportunity to create one Device
value. Therefore, the question of what happens when you have multiple Device
s created from one Adapter
is moot — that's prohibited. But none of that has anything to do with the cardinality of the underlying driver/platform-API adapter and device entities.
So, the original test code in principle ought to be rewritten to repeatedly call initialize_adapter
, not just initialize_device
, to make it correct. This API-usage fix is independent of whether or not there's (still) a memory leak, unless it happens that the leak only happens in this case, in which case the bug can be fixed simply by making wgpu
spec-conformant.
I agree we should implement the validation, I was trying to provide relevant info as to why we are leaking without having the validation implemented. After all, this validation is in the spec most likely because of D3D12's behavior.
Description If I repeatedly create devices with substantial resources, then eventually I will get a failure to create a device (or, if one of the existing devices is used further, an allocation failure within it) even if the devices are dropped.
(This was discovered inside my application's test suite — I recognize that creating many devices is an irregular situation.)
Repro steps Stick this test case in
wgpu/tests/
and run it (branch with needed changes @ https://github.com/gfx-rs/wgpu/compare/master...kpreid:wgpu:winleak):The leak does not (visibly) occur if the same textures are created and dropped while using a single device.
Platform
wgpu
0.15 or git master (b534379acbd16d427e917b382d19d2e16000d829).