KhronosGroup / Vulkan-Samples

One stop solution for all Vulkan samples
Apache License 2.0
4.33k stars 647 forks source link

Fix support for --headless. #1090

Closed iagoCL closed 3 days ago

iagoCL commented 4 months ago

Description

The main objective of this PR is to fix the --headless flag. Currently it does not work in the framework, due to not requesting a surface and swap chain.

This PR reworks HeadlessWindow to make it more similar to the other window classes. This will allow us to enable VK_EXT_headless_surface in HeadlessWindow::get_required_surface_extensions like the other windows types. Now we can remove the headlessparamater from Instance::Instance and VulkanSample<>::create_instance simplifying the framework.

To keep the samples hello_world and hpp_hello_world as simple as possible, I did not add support for headless. Instead, an assert is run to ensure the are not enabling with headless in those samples.

General Checklist:

Please ensure the following points are checked:

Fixes #830 Fixes #314

CLAassistant commented 4 months ago

CLA assistant check
All committers have signed the CLA.

nv-ahamelin commented 3 months ago

Change tested on:

I have tested 3 different setups, 1. Display connected+GDM3 started, 2. Display connected+GDM3 shutdown and 3. Display disconnected.

  1. Display connected+GDM3 started The change appears to be functional on my side, I am able to run multiple samples with the --headless option without encountering errors.

  2. Display connected+GDM3 shutdown The samples will fail with the following error message:

    [error] Error Message: vk::Device::acquireNextImageKHR: ErrorInitializationFailed
    [error] Failed when running application Swapchain Images

    A bit of investigation led me to a failure in the wsi-layer where the usage of vkImportSemaphoreFdKHR when found available fails, leading to the error we see above. I made a local modification to the layer to bypass that usage and instead use the layer fallback path instead in the following file [...]/vulkan-wsi-layer/wsi/swapchain_base.cpp:L434 . With that modification, I am able to run successfully the samples in this configuration.

  3. Display disconnected I have observed the same behavior as the one in 2. and required the same local modification to the layer to make it function.

iagoCL commented 3 months ago

My current setup is using Swiftshader https://github.com/google/swiftshader "a high-performance CPU-based implementation of the Vulka 1.3 graphics AP" the headless flag seems to work correctly in all tested examples

I can reproduce SaschaWillems crash. Needs more investigation, but seems to be valid use of the API, and only reproducible if you enable ICDs for Nvidia and SwiftShader. As a workaround you can avoid the crash explicitily selecting the valid GPU using the --gpu flag I also agreed with SaschaWillems to also add some small improvements to the code and extra documentation to better explain VK_EXT_headless_surface, its intention and how it is diferent from rendering without swapchain. I have been a bit busy this last month, but hopefully I will submit a new PR with some improvements soon.

iagoCL commented 2 months ago

Change the name of the flag and add some comments to better explain it. I also added an explanation in the example in the readme. Based on some feedback I also added a warning when running headless with multiple GPU candidates. Tested on Windows and Linux using Swiftshader.

SaschaWillems commented 1 month ago

Looks like this triggers a CI failure due to copyright headers not being up-to-date. Can you fix that?

iagoCL commented 1 month ago

I did not modified those files, so unsure what trigered the copyright failure.

I just rebased against main, hopefully that fixes the issue

gary-sweet commented 1 month ago

I think I said I was going to test this at the last meeting, but it turns out that we don't actually support VK_EXT_headless_surface, so I 'm afraid I can't.

SaschaWillems commented 1 month ago

It seems to work when explicitly selecting swiftshader. I don't see any visual output, which I guess is expected.

But I'm still getting a crash when not explicitly selecting swiftshader. I do get the warning first that tells me to explicitly select a gpu, but if wont' do that, I get the same error as above. Is there no way to avoid running with that flag on GPUs that don't have support for it? Or at least catch that error and exit gracefully. For me it's stuck in an endless loop of exceptions.

SaschaWillems commented 1 month ago

And does this really fix #314?

iagoCL commented 1 month ago

But I'm still getting a crash when not explicitly selecting swiftshader. I do get the warning first that tells me to explicitly select a gpu, but if wont' do that, I get the same error as above. Is there no way to avoid running with that flag on GPUs that don't have support for it? Or at least catch that error and exit gracefully. For me it's stuck in an endless loop of exceptions.

If we have multiple GPUs, the correct way to check which one supports headless is by calling vkGetPhysicalDeviceSurfaceSupportKHR as the current code id doing. This is causing a crash on Nvidia.

I tried to catch and handle this crash more gratefully, but the crash is in the driver, so exceptions did not work, and I did not found a correct way to detect it from application side.

The alternative is to exit after printing the warning, and forcing user to manually select GPU.

And does this really fix #314?

Not sure.

It mentions instancing and render_passes. When I mentioned that issue I just tested those issues, and both work correctly with headless after this change.

Now that I read the problem more carefully, I am confused. At first glance the current Vulkan Samples will only work on devices that have a present queuee. We check this on get_suitable_gpu (with or without headless), so it should be imposible for get_queue_by_present to fail

So the crash is fixed, but we still have the get_device().get_queue_by_present(0).wait_idle() on submit_frame. We should remove the wait idle, but that is independent of headless, so maybe it will be better to update #314 or create a new issue for that.

marty-johnson59 commented 3 days ago

Per discussion, let's merge this with 2 (w/ Steve's approval)