gfx-rs / wgpu-native

Native WebGPU implementation based on wgpu-core
Apache License 2.0
874 stars 105 forks source link

v22.1.0.1 crashes on `wgpuQueueSubmit()` if `wgpuRenderPassEncoderRelease()` wasn't called. #412

Open yig opened 2 months ago

yig commented 2 months ago

I just updated to wgpu-native v22.1.0.1. If I don't call wgpuRenderPassEncoderRelease() before wgpuQueueSubmit(), I get an error:

thread '<unnamed>' panicked at /Users/runner/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/5c5c8b1/wgpu-core/src/command/mod.rs:522:14:
CommandBuffer cannot be destroyed because is still in use

The code looks like this:

wgpuRenderPassEncoderEnd( render_pass );
wgpuRenderPassEncoderRelease( render_pass ); // crashes without this line
WGPUCommandBuffer command = wgpuCommandEncoderFinish( encoder, nullptr );
wgpuQueueSubmit( queue, 1, &command );

The relevant part of the backtrace:

   3: core::option::expect_failed
             at /rustc/3f5fd8dd41153bc5fdca9427e9e05be2c767ba23/library/core/src/option.rs:1995:5
   4: wgpu_core::command::CommandBuffer<A>::from_arc_into_baked
   5: wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_submit
   6: _wgpuQueueSubmit

Here is a runnable fairly minimal crashing example: https://github.com/yig/LearnWebGPU-Code/blob/step030-vanilla/main.cpp#L233

I hacked the CMakeLists.txt to run on macos-aarch64. (It hard-codes the macos-aarch64 release and adds -framework Metal linker flags. Otherwise, it should be easy to modify to reproduce on other platforms.)

git clone git@github.com:yig/LearnWebGPU-Code.git
cd LearnWebGPU-Code
git switch step030-vanilla
cmake -B build
cmake --build build -j 6
./build/App
rajveermalviya commented 2 months ago

Yeah in previous versions it was incorrectly allowed to release the RenderPass later, but in latest version that bug is fixed. Also RenderPasses are single use only so it doesn't make sense to keep it around after calling wgpuRenderPassEncoderEnd on it.

yig commented 2 months ago

How could that be part of the spec? In JavaScript, I didn't think it was possible to manually release a RenderPass.

rajveermalviya commented 2 months ago

wgpu[Object]Release() and (the Reference() conterpart) are not in javascript spec but part of the webgpu.h.

yig commented 2 months ago

Where are things like this RenderPass lifetime rule specified? My understanding was that the JS API and the C API should be implementable on top of each other. I don't see how the JS API could be implemented on top of the C API with this lifetime rule.

rajveermalviya commented 2 months ago

Seem like this is similar to https://github.com/gfx-rs/wgpu/issues/6145.

yig commented 2 months ago

Yes, this is exactly the same. The extra scope for the render pass shown there serves as a workaround for a ref-counted language (including C++ with RAII). I guess a similar workaround for a garbage collected language could be to add the scope and manually triggering the garbage collector immediately after closing the scope. I hope this can be fixed and these workarounds avoided.

Vipitis commented 2 months ago

I also run into this in pygfx/wgpu-py#547 and just made the private release is part of the public end method.

I am not sure if there is any case where a end call is not followed by release. So it feels somewhat redundant

trbabb commented 3 weeks ago

I have just run into this myself, and I must say the error message and fix was not at all intuitive.

Is there something I can refer to to understand the lifetime expectations and refcount rules for these objects in C/C++?

yig commented 3 weeks ago

My understanding is that this is a deviation from the spec.