KhronosGroup / Vulkan-Samples

One stop solution for all Vulkan samples
Apache License 2.0
4.29k stars 638 forks source link

All samples crash on Linux on resize when VALIDATION_CHECK_ENABLE_SYNCHRONIZATION_VALIDATION_QUEUE_SUBMIT is set #605

Closed mokafolio closed 1 year ago

mokafolio commented 1 year ago

hello_triangle and all other samples I tried segfault with [info] Recreated swapchain [info] Depth format selected: VK_FORMAT_D32_SFLOAT [1] 12219 segmentation fault (core dumped) ./build/linux/app/bin/Debug/x86_64/vulkan_samples sample instancing When VALIDATION_CHECK_ENABLE_SYNCHRONIZATION_VALIDATION_QUEUE_SUBMIT is enabled.

I am on Ubuntu 22.04 SDK version 1.3.236. On nvidia RTX 3070 (nvidia-drivers-525), all examples crash. On integrated intel GPU, only hello_triangle crashes.

There is a good chance that this is a validation layer issue but I thought I'd put it here for the time being until that is clarified.

danilw commented 1 year ago

may be related to https://github.com/KhronosGroup/Vulkan-Samples/issues/250

mokafolio commented 1 year ago

@danilw while it could be, I have my own code running without any of those issues so I assume its something specific to how hello_triangle and the rest of the samples handle swapchain recreation.

SaschaWillems commented 1 year ago

Indeed, resize handling in the sample and the framework is not optimal. We'll try to fix this.

mokafolio commented 1 year ago

@SaschaWillems Yeah, I am actually surprise because looking at the hello_triangle code things actually seemed pretty much by the book to me. Have a feeling this might either actually be a validation layer issue or something really subtle.

gpx1000 commented 1 year ago

@mokafolio Thanks for submitting this. Could you verify if this test is on X11 or Wayland? If it was/is X11 could you lemme know the usecase for not using Wayland? We are considering dropping X11 support as the majority of Linux users are on wayland.

mokafolio commented 1 year ago

I have been running them on X11 I am pretty sure. The reason being that nvidia drivers and wayland still have a couple of hickups especially with multi display setups, wayland not being able to support custom color profiles and a lot of screen capturing and sharing software simply not being available on wayland (i.e. slack/signal screen sharing). I sympathize with the decision of only supporting wayland but I think that might be a little bit too idealistic for where things stand right now. I will run the test everything on wayland/x11 in a bit to make sure and report back, thank you!

gpx1000 commented 1 year ago

Could I ask a followup as when was the last time those observations were attempted?

I ask because I use wayland in multi-monitor setup (actually two of those monitors are custom mode line lightfield displays with two standard desktop monitors; those are running from an NVIDIA driver). I also use slack, zoom screen sharing and haven't heard complaints. The custom color profiles I'll admit I haven't run into problems with due to not using them so I have no direct knowledge of.

Anyway, lemme know if running in X11 was the cause of the crash. If it was, then #250 is likely the same bug.

mokafolio commented 1 year ago

I just tested it again rebuilding from main using SDK 1.3.239. The result is the same for wayland and X11, now hello_triangle properly works using synchronization validation including QueueSubmit. All other samples crash on resize no matter what form of synchronization validation is enabled.

Not to diverge too much regarding wayland: I truly wish it was the only thing needed at this point but on all my ubuntu machines, in most apps only screen sharing for individual windows works, selecting an actual screen is just black (have not looked into it too much, though).

danilw commented 1 year ago

NVIDIA driver

Nvidia GPU support Vulkan from 6XX series. Nvidia driver on Linux support Wayland only from 20XX GPU.

And on multi-GPU setup where for example I have AMD Vega8 GPU that run on Wayland, and second GPU is Nvidia that do have Vulkan support but does not have Wayland support - x11 will be used as surface for launching Vulkan applications on that Nvidia GPU.

We are considering dropping X11 support as the majority of Linux users are on wayland.

I mean VK_STRUCTURE_TYPE_WAYLAND_SURFACE_CREATE_INFO_KHR can not be used on Nvidia GPU that support Vulkan even if you run desktop on Wayland on second GPU because Nvidia proprietary driver support Wayland only for newest RTX GPUs only.

DXVK and Godot 4 and Chrome Vulkan render and many other "big" Vulkan apps use x11/xcb surface in Vulkan. And removing examples for x11/xcb surfaces from KhronosGroup/Vulkan-Samples can be big confusion for people.

marty-johnson59 commented 1 year ago

Hi @mokafolio, just checking on this one. Do you believe there's something we need to address in the samples to resolve this? Or is this a problem with drivers/configurations? Perhaps there's more testing we need to do? Thanks!

mokafolio commented 1 year ago

I won't be able to check on my other computers until next week. On the intel integrated gpu I have at my hands right now, only hello_triangle crashes with synchronization checks enabled while all the other samples seem to work. I get the same segfault as hello_triangle in my own codebase right now during swapchain recreation so I am pretty sure that the issue might be intel/linux/validation layer related. I will try to report back next week regarding nvidia/linux.

marty-johnson59 commented 1 year ago

HI @mokafolio, any updates on this, or are we OK closing? Thanks

mokafolio commented 1 year ago

I just tried building with a fresh clone (including all submodules) and I can't get the cmake step to succeed:

CMake Error at third_party/CMakeLists.txt:340 (set_property):
  set_property could not find TARGET spdlog_headers_for_ide.  Perhaps it has
  not yet been created.
gpx1000 commented 1 year ago

hmm... Let's see if we can figure out the problem: Could you check that line 160 of third_party/spdlog/CMakeLists.txt defines the target spdlog_headers_for_ide with the following? add_custom_target(spdlog_headers_for_ide SOURCES ${spdlog_include_SRCS}) If the folder is empty, could you try running: git submodule update --init --recursive from the command line and post the results?

mokafolio commented 1 year ago

this is line 160 for me: list(APPEND SPDLOG_SRCS ${CMAKE_CURRENT_BINARY_DIR}/version.rc) If I grep for spdlog_headers_for_ide the only place its defined is inside thirdparty/CMakeLists.txt at line 340: set_property(TARGET spdlog_headers_for_ide PROPERTY FOLDER "ThirdParty")

This is a fresh clone including submodules so my guess is that either the submodule version of spdlog was not properly bumped in the main repo, or I am on the tip of main now while the cmake logic relies on something that is out of date.

gpx1000 commented 1 year ago

I just ran the following locally from the command line in Ubuntu: git clone https://github.com/KhronosGroup/Vulkan-Samples.git cd Vulkan-Samples/ git submodule update --init --recursive tail third_party/spdlog/CMakeLists.txt

I am seeing: add_custom_target(spdlog_headers_for_ide SOURCES ${spdlog_include_SRCS}) as the final line of that file or line 160. Could you try a fresh clone? It admittedly isn't great that we're tracking the submodule by the head of the main branch of spdlog instead of a specific commit; however, it looks like that works. It also looks like CI is doing a fresh clone so that should fail too if it gets out of sync.

mokafolio commented 1 year ago

I just did it again and it works now. I went back through my history to figure out what happened the first time. I accidentally cloned the submodules initially with --remote causing the breakage, my bad :*( . Will check the samples in a minute and report back!

mokafolio commented 1 year ago

I am still seeing the following validation error for hello triangle when synchronization validation is enabled (including QueueSubmit Synchronization Validation):

VUID-vkAcquireNextImageKHR-semaphore-01286(ERROR / SPEC): msgNum: -370888023 - Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01286 ] Object 0: handle = 0xf34a48000000013c, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0xe9e4b2a9 | vkAcquireNextImageKHR: Semaphore must not be currently signaled. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE it must be unsignaled (https://vulkan.lunarg.com/doc/view/1.3.250.1/linux/1.3-extensions/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01286)
    Objects: 1
        [0] 0xf34a48000000013c, type: 5, name: NULL

All the other samples appear to run with no issue and certainly are not crashing anymore so I will close this issue. Thanks!