KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
761 stars 403 forks source link

[BISECTED] Segmentation fault after b3065ebc190d #8067

Closed haasn closed 4 months ago

haasn commented 5 months ago

Environment:

Describe the Issue

Segmentation fault when dispatching the first compute shader in the test suite, starting with commit b3065ebc190d83462d15761cd26edb559f3c5508. (See terminal output below)

Still present as of ff56cf67d3494eec1243cc4225d1667e9b3f90cd.

Additional context

Terminal output + backtrace ``` ... compute shader source: [ 1] #version 450 [ 2] #extension GL_ARB_compute_shader : enable [ 3] #extension GL_KHR_shader_subgroup_basic : enable [ 4] #extension GL_KHR_shader_subgroup_vote : enable [ 5] #extension GL_KHR_shader_subgroup_arithmetic : enable [ 6] #extension GL_KHR_shader_subgroup_ballot : enable [ 7] #extension GL_KHR_shader_subgroup_shuffle : enable [ 8] #extension GL_KHR_shader_subgroup_clustered : enable [ 9] #extension GL_KHR_shader_subgroup_quad : enable [ 10] #extension GL_ARB_shader_image_load_store : enable [ 11] #extension GL_ARB_texture_buffer_object : enable [ 12] layout(std430, push_constant) uniform PushC { [ 13] layout(offset=0) int _4; [ 14] layout(offset=4) int _5; [ 15] layout(offset=8) int _6; [ 16] layout(offset=12) int _7; [ 17] }; [ 18] layout(constant_id=0) const int _8 = 1; [ 19] layout(constant_id=1) const int _9 = 1; [ 20] layout(constant_id=2) const int _a = 1; [ 21] layout(binding=0, r8) restrict uniform imageBuffer _2; [ 22] layout(binding=1, rgba8) readonly restrict uniform image1D _3; [ 23] layout (local_size_x = 16, local_size_y = 1) in; [ 24] [ 25] void _1() { [ 26] ivec3 pos = ivec3(gl_GlobalInvocationID); [ 27] ivec3 tex_pos = pos + ivec3(_4, _5, _6); [ 28] int base = _7 + pos.z * _8 + pos.y * _9 + pos.x * _a; [ 29] vec4 color = imageLoad(_3, int(tex_pos)); [ 30] imageStore(_2, base + 0, vec4(color[0])); [ 31] imageStore(_2, base + 1, vec4(color[1])); [ 32] imageStore(_2, base + 2, vec4(color[2])); [ 33] [ 34] } [ 35] [ 36] void main() { [ 37] _1(); [ 38] } Specialization constant values: constant_id=0: 48 constant_id=1: 48 constant_id=2: 3 shaderc output: input:11: warning: '#extension' : extension not supported: GL_ARB_texture_buffer_object shaderc compile status 'success' (0 errors, 1 warnings) Spent 2.902 ms translating SPIR-V Spent 13.143 ms compiling shader Spent 13.742 ms creating pipeline Pass statistics: size 0, SPIR-V: vert 0 frag 0 comp 0 Thread 9 "test.vulkan.c" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd4a006c0 (LWP 2002)] 0x00007fffe59c902a in std::__atomic_base::load (this=0x28, __m=std::memory_order_seq_cst) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/atomic_base.h:501 501 return __atomic_load_n(&_M_i, int(__m)); Missing separate debuginfos, use: zypper install libdovi3-debuginfo-3.2.0-1.3.x86_64 libglslang14-debuginfo-14.2.0-1.1.x86_64 libopenssl-3-devel-debuginfo-3.1.4-8.1.x86_64 libopenssl3-x86-64-v3-debuginfo-3.1.4-8.1.x86_64 libunwind8-debuginfo-1.8.1-2.1.x86_64 (gdb) info threads Id Target Id Frame 1 Thread 0x7ffff69fac80 (LWP 1987) "test.vulkan.c" __futex_abstimed_wait_common (futex_word=futex_word@entry=0x555555f615dc, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=, cancel=cancel@entry=false) at futex-internal.c:103 2 Thread 0x7fffe02006c0 (LWP 1995) "test.vu:disk$0" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555556cce58) at futex-internal.c:57 3 Thread 0x7ffff68006c0 (LWP 1996) "test.vulkan.c" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55555570a2c8) at futex-internal.c:57 4 Thread 0x7ffff0e006c0 (LWP 1997) "test.vulkan.c" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555e15f18) at futex-internal.c:57 5 Thread 0x7fffdd8006c0 (LWP 1998) "test.vulkan.c" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555556cff38) at futex-internal.c:57 6 Thread 0x7fffdce006c0 (LWP 1999) "test.vu:disk$1" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555556cce5c) at futex-internal.c:57 7 Thread 0x7fffd5e006c0 (LWP 2000) "test.vulkan.c" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555556cd9fc) at futex-internal.c:57 8 Thread 0x7fffd54006c0 (LWP 2001) "test.vulkan.c" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x555555e7056c) at futex-internal.c:57 * 9 Thread 0x7fffd4a006c0 (LWP 2002) "test.vulkan.c" 0x00007fffe59c902a in std::__atomic_base::load (this=0x28, __m=std::memory_order_seq_cst) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/atomic_base.h:501 10 Thread 0x7fffcbe006c0 (LWP 2003) "test.vu:disk$2" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555556cce5c) at futex-internal.c:57 11 Thread 0x7fffcb4006c0 (LWP 2004) "test.vu:disk$3" 0x00007ffff793ffee in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555556cce58) at futex-internal.c:57 (gdb) bt #0 0x00007fffe59c902a in std::__atomic_base::load (this=0x28, __m=std::memory_order_seq_cst) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/atomic_base.h:501 #1 std::atomic::operator bool (this=0x28) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/atomic:92 #2 0x00007fffe59c8f99 in vvl::StateObject::Destroyed (this=0x0) at /home/nand/dev/Vulkan-ValidationLayers/layers/./state_tracker/state_object.h:68 #3 0x00007fffe5d6f152 in vvl::DescriptorValidator::ValidateDescriptor (this=0x7fffd49ff930, binding_info={...}, index=0, descriptor_type=VK_DESCRIPTOR_TYPE_STORAGE_TEXEL_BUFFER, texel_descriptor=...) at /home/nand/dev/Vulkan-ValidationLayers/layers/drawdispatch/descriptor_validator.cpp:1030 #4 0x00007fffe5d74039 in vvl::DescriptorValidator::ValidateDescriptors > (this=0x7fffd49ff930, binding_info={...}, binding=..., indices=std::vector of length 1, capacity 1 = {...}) at /home/nand/dev/Vulkan-ValidationLayers/layers/drawdispatch/descriptor_validator.cpp:98 #5 0x00007fffe5d629b3 in vvl::DescriptorValidator::ValidateBinding (this=0x7fffd49ff930, binding_info={...}, indices=std::vector of length 1, capacity 1 = {...}) at /home/nand/dev/Vulkan-ValidationLayers/layers/drawdispatch/descriptor_validator.cpp:136 #6 0x00007fffe61959a0 in gpuav::CommandBuffer::PostProcess (this=0x555555f60700, queue=0x5555558a44c0, loc=...) at /home/nand/dev/Vulkan-ValidationLayers/layers/gpu_validation/gpu_subclasses.cpp:514 #7 0x00007fffe618d6d6 in gpu_tracker::Queue::Retire (this=0x5555556cf5b0, submission=...) at /home/nand/dev/Vulkan-ValidationLayers/layers/gpu_validation/gpu_state_tracker.cpp:161 #8 0x00007fffe6290e2a in vvl::Queue::ThreadFunc (this=0x5555556cf5b0) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/queue_state.cpp:227 #9 0x00007fffe6293e49 in std::__invoke_impl (__f=@0x555555f24fa0: (void (vvl::Queue::*)(vvl::Queue * const)) 0x7fffe6290de0 , __t=@0x555555f24f98: 0x5555556cf5b0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/invoke.h:74 #10 0x00007fffe6293d8d in std::__invoke (__fn=@0x555555f24fa0: (void (vvl::Queue::*)(vvl::Queue * const)) 0x7fffe6290de0 , __args=@0x555555f24f98: 0x5555556cf5b0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/invoke.h:96 #11 0x00007fffe6293d62 in std::thread::_Invoker >::_M_invoke<0ul, 1ul> (this=0x555555f24f98) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/std_thread.h:292 #12 0x00007fffe6293d25 in std::thread::_Invoker >::operator() (this=0x555555f24f98) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/std_thread.h:299 #13 0x00007fffe6293c09 in std::thread::_State_impl > >::_M_run (this=0x555555f24f90) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/std_thread.h:244 #14 0x00007ffff7ba23c4 in std::execute_native_thread_routine (__p=0x555555f24f90) at ../../../../../libstdc++-v3/src/c++11/thread.cc:104 #15 0x00007ffff7943ba2 in start_thread (arg=) at pthread_create.c:447 #16 0x00007ffff79c500c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 (gdb) t 1 [Switching to thread 1 (Thread 0x7ffff69fac80 (LWP 1987))] #0 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x555555f615dc, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=, cancel=cancel@entry=false) at futex-internal.c:103 103 switch (err) (gdb) bt #0 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x555555f615dc, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=, cancel=cancel@entry=false) at futex-internal.c:103 #1 0x00007ffff794006c in __GI___futex_abstimed_wait64 (futex_word=futex_word@entry=0x555555f615dc, expected=expected@entry=3, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=) at futex-internal.c:128 #2 0x00007ffff794a2c0 in __pthread_rwlock_wrlock_full64 (abstime=0x0, clockid=0, rwlock=0x555555f615d0) at /usr/src/debug/glibc-2.39/nptl/pthread_rwlock_common.c:730 #3 ___pthread_rwlock_wrlock (rwlock=0x555555f615d0) at pthread_rwlock_wrlock.c:26 #4 0x00007fffe5995f73 in std::__glibcxx_rwlock_wrlock (__rwlock=0x555555f615d0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/shared_mutex:85 #5 0x00007fffe59bf9d5 in std::__shared_mutex_pthread::lock (this=0x555555f615d0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/shared_mutex:198 #6 0x00007fffe59bf9b5 in std::shared_mutex::lock (this=0x555555f615d0) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/shared_mutex:425 #7 0x00007fffe59bf98c in std::unique_lock::lock (this=0x7fffffffabe8) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/unique_lock.h:147 #8 0x00007fffe5997d28 in std::unique_lock::unique_lock (this=0x7fffffffabe8, __m=...) at /usr/bin/../lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/bits/unique_lock.h:73 #9 0x00007fffe59d0e27 in vvl::CommandBuffer::WriteLock (this=0x555555f60700) at /home/nand/dev/Vulkan-ValidationLayers/layers/./state_tracker/cmd_buffer_state.h:530 #10 0x00007fffe61e417e in vvl::CommandBuffer::NotifyInvalidate (this=0x555555f60700, invalid_nodes=..., unlink=true) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/cmd_buffer_state.cpp:367 #11 0x00007fffe628d6cc in vvl::StateObject::NotifyInvalidate (this=0x555555f20f60, invalid_nodes=..., unlink=true) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/state_object.cpp:96 #12 0x00007fffe62157b0 in vvl::DescriptorSet::NotifyInvalidate (this=0x555555f20f60, invalid_nodes=..., unlink=true) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/descriptor_sets.cpp:516 #13 0x00007fffe628d6cc in vvl::StateObject::NotifyInvalidate (this=0x555555f83290, invalid_nodes=..., unlink=true) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/state_object.cpp:96 #14 0x00007fffe628cef6 in vvl::StateObject::Invalidate (this=0x555555f83290, unlink=true) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/state_object.cpp:82 #15 0x00007fffe628ce8e in vvl::StateObject::Destroy (this=0x555555f83290) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/state_object.cpp:24 #16 0x00007fffe619669d in vvl::BufferView::Destroy (this=0x555555f83290) at /home/nand/dev/Vulkan-ValidationLayers/layers/./state_tracker/buffer_state.h:114 #17 0x00007fffe61930df in gpuav::BufferView::Destroy (this=0x555555f83290) at /home/nand/dev/Vulkan-ValidationLayers/layers/gpu_validation/gpu_subclasses.cpp:51 #18 0x00007fffe62f1a39 in ValidationStateTracker::Destroy > (this=0x555555bd1bc0, handle=0xbe95e40000000dda) at /home/nand/dev/Vulkan-ValidationLayers/layers/./state_tracker/state_tracker.h:295 #19 0x00007fffe62d3599 in ValidationStateTracker::PreCallRecordDestroyBufferView (this=0x555555bd1bc0, device=0x555555baf110, bufferView=0xbe95e40000000dda, pAllocator=0x0, record_obj=...) at /home/nand/dev/Vulkan-ValidationLayers/layers/state_tracker/state_tracker.cpp:561 #20 0x00007fffe5deb09f in vulkan_layer_chassis::DestroyBufferView (device=0x555555baf110, bufferView=0xbe95e40000000dda, pAllocator=0x0) at /home/nand/dev/Vulkan-ValidationLayers/layers/vulkan/generated/chassis.cpp:2247 #21 0x0000555555622969 in vk_buf_deref (gpu=0x555555ca85f0, buf=0x555555efa440) at ../src/vulkan/gpu_buf.c:91 #22 0x00005555556228bd in vk_buf_deref_cb (priv=0x555555ca85f0, arg=0x555555efa440) at ../src/vulkan/gpu_buf.c:20 #23 0x0000555555617606 in flush_callbacks (vk=0x5555556dfec0) at ../src/vulkan/command.c:38 #24 0x0000555555617400 in vk_cmd_reset (cmd=0x555555deaad0) at ../src/vulkan/command.c:52 #25 0x000055555561658f in vk_poll_commands (vk=0x5555556dfec0, timeout=0) at ../src/vulkan/command.c:538 #26 0x00005555556160eb in vk_cmd_begin (pool=0x555555c00c50, debug_tag=0x5555555754d1 "vk_pass_run") at ../src/vulkan/command.c:367 #27 0x000055555561fec3 in _begin_cmd (gpu=0x555555ca85f0, type=COMPUTE, label=0x5555555754d1 "vk_pass_run", timer=0x555555ed4010) at ../src/vulkan/gpu.c:168 #28 0x000055555562be6d in vk_pass_run (gpu=0x555555ca85f0, params=0x7fffffffb858) at ../src/vulkan/gpu_pass.c:813 #29 0x00005555555e1e09 in pl_pass_run (gpu=0x555555ca85f0, params=0x555555eca930) at ../src/gpu.c:1252 #30 0x00005555555cadb8 in run_pass (dp=0x555555e308e0, sh=0x555555b61a90, pass=0x555555eca8c0) at ../src/dispatch.c:1132 #31 0x00005555555cb6db in pl_dispatch_compute (dp=0x555555e308e0, params=0x7fffffffc308) at ../src/dispatch.c:1430 #32 0x00005555555e5b4f in pl_tex_download_texel (gpu=0x555555ca85f0, params=0x555555fa7960) at ../src/gpu/utils.c:837 #33 0x00005555556283c9 in vk_tex_download (gpu=0x555555ca85f0, params=0x7fffffffc608) at ../src/vulkan/gpu_tex.c:1029 #34 0x00005555555ddb45 in pl_tex_download (gpu=0x555555ca85f0, params=0x7fffffffc6e8) at ../src/gpu.c:527 #35 0x00005555555e4bdd in pl_tex_download_pbo (gpu=0x555555ca85f0, params=0x7fffffffca68) at ../src/gpu/utils.c:669 #36 0x0000555555628150 in vk_tex_download (gpu=0x555555ca85f0, params=0x7fffffffca68) at ../src/vulkan/gpu_tex.c:994 #37 0x00005555555ddb45 in pl_tex_download (gpu=0x555555ca85f0, params=0x7fffffffcb58) at ../src/gpu.c:527 #38 0x000055555569860a in pl_test_roundtrip (gpu=0x555555ca85f0, tex=0x7fffffffce70, src=0x555556008200 "\337\037\025\035αʅ\200\035\223\353\304?\314\367\320\355Q\200\023\312\300U\314\355A_Us\2665\222\314S`}\035\345\376;y\351\377\270\266\366\210Z\02250\356c\364p$\305\345\212\343\341\233HnUϐ\274\017\347\311T\257\271\256\301w*\034\211_Lw\302A\347\347\a\315q\353\256\r3\035c\003\255\037\022\224\351gC\242\025\005\032@!\244\237n\033b\260\003J\267м\242\177\312֜-\332JM\354\3367S\"\332i'\364\251I\230I\267\264\254h\270\367\037\211\264\302\b~\231\245\254s\357\371_\3160\263\361\v\035\031\377\307b\230\020\032M\275\202\005\264\242\217id\227\347\375=\223p-\215\320\373\276\204\354ɡ"..., dst=0x555556028200 "") at ../src/tests/gpu_tests.h:184 #39 0x000055555568dd17 in pl_texture_tests (gpu=0x555555ca85f0) at ../src/tests/gpu_tests.h:245 #40 0x000055555568c10e in gpu_shader_tests (gpu=0x555555ca85f0) at ../src/tests/gpu_tests.h:1722 #41 0x000055555568bda0 in main () at ../src/tests/vulkan.c:199 (gdb) ```
haasn commented 5 months ago

It's worth pointing out that the buffer view being destroyed in the backtrace (via vulkan_layer_chassis::DestroyBufferView) is the one temporarily being used by the compute shader (layout(binding=0, r8) restrict uniform imageBuffer _2;).

Since the vk_buf_deref callback is being run, the command in question has definitely completed. This is witnessed by a semaphore wait (vkWaitSemaphores) on a semaphore being signaled by this command (at VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT). There is no VkFence involved, so no vkWaitForFences call to witness the command's completion. It's possible that this is the source of the issue, as the command may not be properly retired in this case.

haasn commented 4 months ago

There is no VkFence involved, so no vkWaitForFences call to witness the command's completion. It's possible that this is the source of the issue, as the command may not be properly retired in this case.

Unfortunately this is not the case, I tried switching to vkWaitForFences but get the same segfault.

arno-lunarg commented 4 months ago

Hello @haasn , you mention a test suite in the issue, could you point me to it please, and tell me what settings you were using?

haasn commented 4 months ago

Hello @haasn , you mention a test suite in the issue, could you point me to it please, and tell me what settings you were using?

arno-lunarg commented 4 months ago

Hey @haasn, thank you for the details. Unfortunately those steps do not seem to be enough on Windows. It is my first time using meson, so maybe my setup is broken.

1) I had to remove the search of the atomic library, it fails and I can confirm that indeed atomic.lib is nowhere to be found on my computer

if not cc.links(atomic_test)
 build_deps += cc.find_library('atomic')
endif

2) meson cannot find the vulkan loader, and glslang, even though I have the Vulkan SDK installed, so I cannot build the vulkan tests

Do you happen to have insights on those issues please?

spencer-lunarg commented 4 months ago

@haasn so I am on Linux and close to reproducing, but I can't get meson to detect the SDK (or my personal clone) of glslang/shaderc (without it, the test fails with Failed initializing any SPIR-V compiler! Maybe libplacebo was built without support for either libshaderc or glslang?)

I try meson setup build -Dtests=true -Dglslang=enabled but will complain about C++ static library 'SPIRV' not found and I have tried setting the pkgconfig and everything to my various paths

(edit - think I got it, but required a lot of hard coding in the meson.build)

spencer-lunarg commented 4 months ago

@haasn I can fully reproduce it!

so for clarity, it only occurs with GPU-AV (Best Practice, core valid, etc, don't effect the crash)

spencer-lunarg commented 4 months ago

@haasn we should have this all fixed now

I have also added the libplacebo's meson test -Cbuild vulkan.c test to our SDK release to make sure GPU-AV (and rest of validation as well) doesn't regress on it!

Thanks for opening the issue