KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
751 stars 403 forks source link

minImageCount validation without VK_EXT_swapchain_maintenance1 #7921

Closed cnorthrop closed 5 months ago

cnorthrop commented 5 months ago

Environment:

Describe the Issue

We're seeing the following VVL error on platforms that don't support VK_EXT_swapchain_maintenance1:

vk_renderer.cpp:824 (DebugUtilsMessenger): [ VUID-VkSwapchainCreateInfoKHR-presentMode-02839 ] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-presentMode-02839 ] | MessageID = 0xfb70ccdb | vkCreateSwapchainKHR(): pCreateInfo->minImageCount 3, which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR() (i.e. minImageCount = 5, maxImageCount = 0). The Vulkan spec states: If presentMode is not VK_PRESENT_MODE_SHARED_DEMAND_REFRESH_KHR nor VK_PRESENT_MODE_SHARED_CONTINUOUS_REFRESH_KHR, then minImageCount must be greater than or equal to the value returned in the minImageCount member of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkSwapchainCreateInfoKHR-presentMode-02839)

Expected behavior

We don't think that VUID should be checked without support for VK_EXT_swapchain_maintenance1

Repro

If you have a matching system, you can repro with any of ANGLE's trace tests.

Additionally we've been able to repro on AMD based systems by disabling the extension using the following:

ANGLE_FEATURE_OVERRIDES_DISABLED=supportsSwapchainMaintenance1 out/Debug/angle_trace_tests --gtest_filter="*minetest*" --verbose
spencer-lunarg commented 5 months ago

We don't think that VUID should be checked without support for VK_EXT_swapchain_maintenance1

I guess from a spec perspective why is this the case? The VU was written and checked before VK_EXT_swapchain_maintenance1 was added, how would not having the extension supported effect the results of a vkGetPhysicalDeviceSurfaceCapabilitiesKHR query?

artem-lunarg commented 5 months ago

We had an internal discussion recently about this minImageCount = 5, and the final conclusion from Google engineer was "Yes, this does look like ANGLE bug".

Also there might be a question to the driver, since minImageCount=5 does not look like a practical limit. Recently I encountered this comment in the Granite vulkan renderer: https://github.com/Themaister/Granite/blob/495d8d2b78e39d8c8283447d2f68f0ad9ee297ee/vulkan/wsi.cpp#L1317

y-novikov commented 5 months ago

Per https://ci.chromium.org/ui/p/angle/builders/try/linux-exp-test/5/infra you should also be able to reproduce this on Linux Intel UHD 770 GPU with Mesa 23.2.1 and angle_perftests. E.g. [ RUN ] BlitFramebufferPerf.Run/vulkan_color_2_samples ../../src/tests/perf_tests/ANGLEPerfTest.h:131: Failure Failed Failing test because of unexpected error: vk_renderer.cpp:824 (DebugUtilsMessenger): [ VUID-VkSwapchainCreateInfoKHR-presentMode-02839 ] Validation Error: [ VUID-VkSwapchainCreateInfoKHR-presentMode-02839 ] | MessageID = 0xfb70ccdb | vkCreateSwapchainKHR(): pCreateInfo->minImageCount 3, which is outside the bounds returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR() (i.e. minImageCount = 5, maxImageCount = 0). The Vulkan spec states: If presentMode is not VK_PRESENT_MODE_SHARED_DEMAND_REFRESH_KHR nor VK_PRESENT_MODE_SHARED_CONTINUOUS_REFRESH_KHR, then minImageCount must be greater than or equal to the value returned in the minImageCount member of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-VkSwapchainCreateInfoKHR-presentMode-02839)

Command is ./src/tests/run_perf_tests.py --isolated-script-test-output=output.json --isolated-script-test-perf-output=perftest-output.json --log=debug --smoke-test-mode --show-test-stdout

ShabbyX commented 5 months ago

We had an internal discussion recently about this minImageCount = 5, and the final conclusion from Google engineer was "Yes, this does look like ANGLE bug".

There are two separate issues:

We fixed the former issue in ANGLE. The latter is what this issue is about.

spencer-lunarg commented 5 months ago

@cnorthrop @ShabbyX I created https://gitlab.khronos.org/vulkan/vulkan/-/merge_requests/6640

Talked to Artem, and I am still not sure as of right now where in the spec it says this VU matters for VK_EXT_swapchain_maintenance1

spencer-lunarg commented 5 months ago

update, there is a chance I just really don't understand WSI at all and there is no need for a spec issue, @artem-lunarg is going to try and reproduce this and take it over for me

ShabbyX commented 5 months ago

The spec has a note that hopefully makes things clearer here (see "Note"): https://docs.vulkan.org/spec/latest/chapters/VK_KHR_surface/wsi.html#VkSurfacePresentModeEXT

ShabbyX commented 5 months ago

Just noticed something in our internal chat. The device where this is happening supports VK_EXT_surface_maintenance1, but not VK_EXT_swapchain_maintenance1. That's rather odd given how closely these two are related. So I just checked and ANGLE does enable surface-maint1 in this case... I have to think about whether we could conform to the per-present minImageCount or not (and whether the VU change you are making Spencer should be based on surface- or swapchain-maint1)

If the minImageCount part is independent of swapchain-maint1, then this was probably still an ANGLE bug after all :disappointed:

artem-lunarg commented 5 months ago

@ShabbyX The way I understood the original error message (with minCount = 5), is that large min count suggests that conservative computation took place, and that can happen when VVL works with compatibility group. If the application did not ask for VK_EXT_swapchain_maintenance1, then I guess the expectation that code path should not be taken by VVL?

We discussed yesterday with Spencer in which cases VVL can run that check. We found one VVL code path that uses pNext from VK_EXT_swapchain_maintenance1 without directly checking if that extension is enabled (in AcquireNextImage). But in that case (when pNext contains structure from VK_EXT_swapchain_maintenance1 and this extension is not enabled), the VVL stateless validation should complain earlier, so in the end this excluded the only suspicious case.

ShabbyX commented 5 months ago

The way I understood the original error message (with minCount = 5), is that large min count suggests that conservative computation took place, and that can happen when VVL works with compatibility group.

I haven't seen the code, so I don't know. My guess was that VVL made a per-present-mode query (a surface-maint1 feature) to get the more exact minCount, while ANGLE made the generic query and got the inexact minCount. The Note under https://docs.vulkan.org/spec/latest/chapters/VK_KHR_surface/wsi.html#VkSurfacePresentModeEXT explains how the generic one may produce a smaller count than the per-present-mode one.

I haven't got a chance to dig into it yet, but my growing suspicion is that VVL was right to do that because ANGLE did enable surface_maint1, but didn't actually use it because swapchain_maint1 was missing.

ShabbyX commented 5 months ago

Yes I can confirm this was an ANGLE bug, and adjusting the minImageCount if surface_maint1 is exposed (even if swapchain_maint1 isn't) is both doable and satisfies VVL.

Sorry we didn't notice this earlier.

artem-lunarg commented 5 months ago

@ShabbyX thanks for checking this. I'm going to cancel my plans for debugging session.

artem-lunarg commented 5 months ago

My guess was that VVL made a per-present-mode query (a surface-maint1 feature) to get the more exact minCount

Yes, this part:

    VkSurfacePresentModeEXT present_mode_info = vku::InitStructHelper();
    if (IsExtEnabled(device_extensions.vk_ext_surface_maintenance1)) {
        present_mode_info.presentMode = create_info.presentMode;
        present_mode_info.pNext = surface_info_pnext;
        surface_info_pnext = &present_mode_info;
    }
    const auto surface_caps = surface_state->GetSurfaceCapabilities(physical_device_state->VkHandle(), surface_info_pnext);

I was confused about that large minCount value of 5 (and it's retrieved by the VVL, not app). I thought VVL does some generic query (because how can you get 5 from swapchain instead of 2 or 3). But the above code shows we do specific query due to enabled vk_ext_surface_maintenance1. It has to be it's just a "lucky" present mode which requires a lot of images and the exact query probably does not improve much on the conservative generic case (conservative case is a maximum of specific cases, so there should be a specific case that matches a conservative one).