gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.68k stars 928 forks source link

Vulkan Validation Error: Cannot free VkBuffer that is in use by a command buffer. #1689

Closed Imberflur closed 3 months ago

Imberflur commented 3 years ago

Description Vulkan validation error:

ERROR gfx_backend_vulkan: 
VALIDATION [VUID-vkDestroyBuffer-buffer-00922 (0xe4549c11)] : Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x4ce96a00000014e6, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x4ce96a00000014e6[] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
object info: (type: BUFFER, hndl: 0x4ce96a00000014e6)

I think this occurs in veloren when a switch between scenes is initiated that is quickly interrupted. Since I get it after getting kicked to the character selection screen by an error from the server.

Repro steps Attached API trace

Expected vs observed behavior No validation errors

Extra materials wgpu-trace.zip wgpu-trace.z01.zip wgpu-trace.z02.zip should be able to extract these by removing the .zip from the last two and running unzip wpgu-trace.zip

Platform

OS: Manjaro 21.1.0 Pahvo
DE: Xfce4
GPU: AMD Radeon HD 7900 Series (TAHITI, DRM 3.40.0, 5.10.49-1-MANJARO, LLVM 12.0.0)
wgpu commit: a92b8549a8e2cb9dac781bafc5ed32828f3caf46
kvark commented 3 years ago

I'm following your instructions, but unzip complains loudly and fails to properly unpack them. Maybe try https://wormhole.app/ upload?

Imberflur commented 3 years ago

@kvark thanks for the tip, here is the link (it should last 24hr let me know if another upload is needed) https://wormhole.app/d9kpY#oEaxV-v-Moh-FppzVmSREA

kvark commented 3 years ago

Strangely, I downloaded this "wgpu-trace-whole.zip", unpacked it, and it's still incomplete. The data indices start at 2500 or so. Not sure what's going on.

Imberflur commented 3 years ago

Hmm, I must have not put it back together correctly. I will try to get the original or re-create it.

Imberflur commented 3 years ago

Hopefully this one works https://wormhole.app/aRbrY#f0nTkJk-1F7iDug3XnTYUA (sorry for the issues)

Imberflur commented 3 years ago

I am getting this issue when running some of examples as tests (e.g. boids, water). The test failing might be necessary to trigger it since I get e.g.:

thread 'main' panicked at 'Image data mismatch! Outlier count 2359296 over limit 460. Max difference 255', wgpu/examples/water/../../tests/common/image.rs:134:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'UNEXPECTED TEST FAILURE', wgpu/examples/water/../../tests/common/mod.rs:301:9

However, the vulkan validation errors appear before this and I don't know how to make the test pass to see whether it still triggers validation errors. I don't see them when running the example normally though. Also, it might be significant to note that when this occurs the test hangs and doesn't exit.

nickkuk commented 3 years ago

I don't use wgpu, just pure ash and have the same validation error in the case where I'm sure that I've waited for the timeline semaphore in the right way. So it can be just noisy incorrect validation warning.

Imberflur commented 3 years ago

I think I figured out how to get the test to not panic (by deleting the reference image). And it appears to still be producing vulkan validation errors and hanging.

kvark commented 3 years ago

Perhaps you could branch out the actual repro case for me to try?

TheSpydog commented 2 years ago

I'm getting the same validation error when running the wgpu halmark example on Windows 10. If there's any environment information that would be helpful for debugging this, let me know and I can post it.

kvark commented 2 years ago

@TheSpydog what validation layers version are you using?

Imberflur commented 2 years ago

Perhaps you could branch out the actual repro case for me to try?

I completely missed this!

It seems like I can no longer reproduce this, I ran cargo test --example water on several branches: v0.10, v0.11, v0.12, and master. None of them produced this validation error. I can only assume a driver update has resolved it.

Strangely, I also didn't get any test failures like this one which I had before:

thread 'main' panicked at 'Image data mismatch! Outlier count 2359296 over limit 460. Max difference 255', wgpu/examples/water/../../tests/common/image.rs:134:13

Either I'm running the test differently or a driver update fixed both things.

Current gpu info: From glinfo:

AMD Radeon HD 7900 Series (TAHITI, DRM 3.40.0, 5.10.79-1-MANJARO, LLVM 13.0.0)

From vulkaninfo:

VkPhysicalDeviceDriverProperties:
---------------------------------
    driverID           = DRIVER_ID_MESA_RADV
    driverName         = radv
    driverInfo         = Mesa 21.2.5
    conformanceVersion = 1.2.3.0
VkPhysicalDeviceProperties:
---------------------------
    apiVersion        = 4202678 (1.2.182)
    driverVersion     = 88088581 (0x5402005)
VK_LAYER_KHRONOS_validation (Khronos Validation Layer) Vulkan version 1.2.199,
TheSpydog commented 2 years ago

@TheSpydog what validation layers version are you using?

VK_LAYER_KHRONOS_validation (Khronos Validation Layer) Vulkan version 1.2.198

kvark commented 2 years ago

Thanks! That's quite fresh. It would be useful to know what buffer is being reported. Could you confirm that this is just one of the buffers created on your side (as opposed to us creating it internally)? If you provide "label" to the buffer descriptor, the validation layers should pick it up when reporting an error.

TheSpydog commented 2 years ago

The problematic buffer is explicitly created as part of the halmark example. It's called "stage".

Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x3a6cbb0000000025, name = stage, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x3a6cbb0000000025[stage] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
[2022-01-03T23:23:05Z ERROR wgpu_hal::vulkan::instance]         objects: (type: BUFFER, hndl: 0x3a6cbb0000000025, name: stage)
[2022-01-03T23:23:05Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkResetCommandPool-commandPool-00040 (0xb53e2331)]
kvark commented 2 years ago

Hmm. Reviewing the halmark example code, everything seems to be in place:

kvark commented 2 years ago

@TheSpydog could you upload the run log with RUST_LOG=wgpu_hal=debug please?

TheSpydog commented 2 years ago

Sure, here's the log: halmarklog.txt

kvark commented 2 years ago

Thank you! I was mainly interested if your platform supports timeline semaphores or not, to narrow down the problematic path. Now that we know it's timeline semaphores, I looked at our logic again and wasn't able to find any issues. It's very straightforward. Here are some things to play with if you have time:

  1. In device.wait(&fence, init_fence_value, !0).unwrap();, check the returned value, it should be Ok(true)
  2. Try passing the last parameter as 10 instead of !0, just in case the driver gets confused by our unusual value (we multiply it by 1M before passing to Vulkan)
  3. Try doing cmd_encoder.reset_all(iter::once(init_cmd)); before device.destroy_buffer(staging_buffer);

None of these experiments should be needed, but perhaps we'll find something interesting.

Imberflur commented 2 years ago

It seems like I can actually still reproduce this for my original case (but not in the examples). It only occurs in a very specific scenario so I had not noticed before. I will need to find some time to see if I can test this with an updated version of wgpu.

teoxoy commented 4 months ago

This sounds related to https://github.com/gfx-rs/wgpu/issues/3193#issuecomment-2231057423.

@Imberflur could you try to reproduce the issue on 61739d95833b8217452a5f77455f2ab03eff649e (https://github.com/gfx-rs/wgpu/pull/5910)?

teoxoy commented 3 months ago

I think this was fixed, please reopen/open a new issue if that's not the case.