Closed Imberflur closed 3 months ago
I'm following your instructions, but unzip
complains loudly and fails to properly unpack them. Maybe try https://wormhole.app/ upload?
@kvark thanks for the tip, here is the link (it should last 24hr let me know if another upload is needed) https://wormhole.app/d9kpY#oEaxV-v-Moh-FppzVmSREA
Strangely, I downloaded this "wgpu-trace-whole.zip", unpacked it, and it's still incomplete. The data indices start at 2500 or so. Not sure what's going on.
Hmm, I must have not put it back together correctly. I will try to get the original or re-create it.
Hopefully this one works https://wormhole.app/aRbrY#f0nTkJk-1F7iDug3XnTYUA (sorry for the issues)
I am getting this issue when running some of examples as tests (e.g. boids, water). The test failing might be necessary to trigger it since I get e.g.:
thread 'main' panicked at 'Image data mismatch! Outlier count 2359296 over limit 460. Max difference 255', wgpu/examples/water/../../tests/common/image.rs:134:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'UNEXPECTED TEST FAILURE', wgpu/examples/water/../../tests/common/mod.rs:301:9
However, the vulkan validation errors appear before this and I don't know how to make the test pass to see whether it still triggers validation errors. I don't see them when running the example normally though. Also, it might be significant to note that when this occurs the test hangs and doesn't exit.
I don't use wgpu
, just pure ash
and have the same validation error in the case where I'm sure that I've waited for the timeline semaphore in the right way. So it can be just noisy incorrect validation warning.
I think I figured out how to get the test to not panic (by deleting the reference image). And it appears to still be producing vulkan validation errors and hanging.
Perhaps you could branch out the actual repro case for me to try?
I'm getting the same validation error when running the wgpu halmark example on Windows 10. If there's any environment information that would be helpful for debugging this, let me know and I can post it.
@TheSpydog what validation layers version are you using?
Perhaps you could branch out the actual repro case for me to try?
I completely missed this!
It seems like I can no longer reproduce this, I ran cargo test --example water
on several branches: v0.10
, v0.11
, v0.12
, and master
. None of them produced this validation error. I can only assume a driver update has resolved it.
Strangely, I also didn't get any test failures like this one which I had before:
thread 'main' panicked at 'Image data mismatch! Outlier count 2359296 over limit 460. Max difference 255', wgpu/examples/water/../../tests/common/image.rs:134:13
Either I'm running the test differently or a driver update fixed both things.
Current gpu info:
From glinfo
:
AMD Radeon HD 7900 Series (TAHITI, DRM 3.40.0, 5.10.79-1-MANJARO, LLVM 13.0.0)
From vulkaninfo
:
VkPhysicalDeviceDriverProperties:
---------------------------------
driverID = DRIVER_ID_MESA_RADV
driverName = radv
driverInfo = Mesa 21.2.5
conformanceVersion = 1.2.3.0
VkPhysicalDeviceProperties:
---------------------------
apiVersion = 4202678 (1.2.182)
driverVersion = 88088581 (0x5402005)
VK_LAYER_KHRONOS_validation (Khronos Validation Layer) Vulkan version 1.2.199,
@TheSpydog what validation layers version are you using?
VK_LAYER_KHRONOS_validation (Khronos Validation Layer) Vulkan version 1.2.198
Thanks! That's quite fresh. It would be useful to know what buffer is being reported. Could you confirm that this is just one of the buffers created on your side (as opposed to us creating it internally)? If you provide "label" to the buffer descriptor, the validation layers should pick it up when reporting an error.
The problematic buffer is explicitly created as part of the halmark example. It's called "stage".
Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] Object 0: handle = 0x3a6cbb0000000025, name = stage, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4549c11 | Cannot free VkBuffer 0x3a6cbb0000000025[stage] that is in use by a command buffer. The Vulkan spec states: All submitted commands that refer to buffer, either directly or via a VkBufferView, must have completed execution (https://vulkan.lunarg.com/doc/view/1.2.198.1/windows/1.2-extensions/vkspec.html#VUID-vkDestroyBuffer-buffer-00922)
[2022-01-03T23:23:05Z ERROR wgpu_hal::vulkan::instance] objects: (type: BUFFER, hndl: 0x3a6cbb0000000025, name: stage)
[2022-01-03T23:23:05Z ERROR wgpu_hal::vulkan::instance] VALIDATION [VUID-vkResetCommandPool-commandPool-00040 (0xb53e2331)]
Hmm. Reviewing the halmark example code, everything seems to be in place:
staging_buffer
is only used in cmd_encoder
init_cmd
, which is submitted with fence value of init_fence_value
init_fence_value
on the same fence, indefinitely@TheSpydog could you upload the run log with RUST_LOG=wgpu_hal=debug
please?
Sure, here's the log: halmarklog.txt
Thank you! I was mainly interested if your platform supports timeline semaphores or not, to narrow down the problematic path. Now that we know it's timeline semaphores, I looked at our logic again and wasn't able to find any issues. It's very straightforward. Here are some things to play with if you have time:
device.wait(&fence, init_fence_value, !0).unwrap();
, check the returned value, it should be Ok(true)
10
instead of !0
, just in case the driver gets confused by our unusual value (we multiply it by 1M before passing to Vulkan)cmd_encoder.reset_all(iter::once(init_cmd));
before device.destroy_buffer(staging_buffer);
None of these experiments should be needed, but perhaps we'll find something interesting.
It seems like I can actually still reproduce this for my original case (but not in the examples). It only occurs in a very specific scenario so I had not noticed before. I will need to find some time to see if I can test this with an updated version of wgpu
.
This sounds related to https://github.com/gfx-rs/wgpu/issues/3193#issuecomment-2231057423.
@Imberflur could you try to reproduce the issue on 61739d95833b8217452a5f77455f2ab03eff649e (https://github.com/gfx-rs/wgpu/pull/5910)?
I think this was fixed, please reopen/open a new issue if that's not the case.
Description Vulkan validation error:
I think this occurs in veloren when a switch between scenes is initiated that is quickly interrupted. Since I get it after getting kicked to the character selection screen by an error from the server.
Repro steps Attached API trace
Expected vs observed behavior No validation errors
Extra materials wgpu-trace.zip wgpu-trace.z01.zip wgpu-trace.z02.zip should be able to extract these by removing the
.zip
from the last two and runningunzip wpgu-trace.zip
Platform