KhronosGroup / Vulkan-Loader

Vulkan Loader
https://vulkan.lunarg.com/doc/sdk/latest/linux/LoaderInterfaceArchitecture.html
Other
518 stars 281 forks source link

`vkEnumartePhysicalDevices` returns `VK_INCOMPLETE` with a not-long-enough `pPhysicalDeviceCount` #1165

Closed 06393993 closed 1 year ago

06393993 commented 1 year ago

Description

On a Windows machine with dual GPUs, vulkaninfo.exe can report vkEnumeratePhysicalDevices failed with INCOMPLETE in the console. The same issue can also happen with our Vulkan application calling into vkEnumeratePhysicalDevices, and we do see VK_INCOMPLETE repeatedly returned even if we set pPhysicalDeviceCount to the the value returned from the API, so it's not a bug in vulkaninfo.exe, but it's easy to reproduce with vulkaninfo.exe.

This issue happens with the 231.1 loader, but doesn't happen on the 224.1 loader. With the problematic loader, the Vulkan Configurator reports an error when opening:

Cannot find any Vulkan Physical Devices

Then, it shows VK_LAYER_AMD_switchable_graphics - 1.1.106 Implicit layer and VK_LAYER_NV_optimus - 1.3.224 - Implicit layer. If we disable the old VK_LAYER_AMD_switchable_graphics layer through the Vulkan Configurator, vulkaninfo.exe can work as normal again. We also try to build a loader with https://github.com/KhronosGroup/Vulkan-Loader/commit/2f87e2b3a5578fba1563d8ac48df06dec3d9e183 reverted, vulkaninfo.exe can work as well. Therefore, we conclude that https://github.com/KhronosGroup/Vulkan-Loader/commit/2f87e2b3a5578fba1563d8ac48df06dec3d9e18 is the culprit on this machine. This problem can be relevant to #552.

Environment (please complete the following information):

To Reproduce Steps to reproduce the behavior:

  1. Download the version of the vulkan loader to test, and put it in the same directory of the vulkaninfo.exe.
  2. Run vulkaninfo.exe in a terminal.
  3. See ERROR at c:\j\msdk\build\khronos-tools\repo\vulkaninfo\vulkaninfo.h:237:vkEnumeratePhysicalDevices failed with INCOMPLETE if the loader is problematic, e.g. this problem can be reproduced with the latest release 1.3.243.0.

VK_LOADER_DEBUG output Attach output when running with the environment variable VK_LOADER_DEBUG=all The full stdout log can be found here. Something I think important are:

WARNING: [Loader Message] Code 0 : Layer VK_LAYER_AMD_switchable_graphics uses API version 1.1 which is older than the application specified API version of 1.3. May cause issues.
WARNING | LAYER:  Layer VK_LAYER_AMD_switchable_graphics uses API version 1.1 which is older than the application specified API version of 1.3. May cause issues.
...
INFO:             terminator_EnumeratePhysicalDevices : Trimming device count from 2 to 1.
...
ERROR at c:\j\msdk\build\khronos-tools\repo\vulkaninfo\vulkaninfo.h:237:vkEnumeratePhysicalDevices failed with INCOMPLETE

stderr:

UNASSIGNED-khronos-validation-createinstance-status-message(INFO / SPEC): msgNum: -671457468 - Validation Information: [ UNASSIGNED-khronos-validation-createinstance-status-message ] Object 0: handle = 0x21ca7a112b0, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0xd7fa5f44 | Khronos Validation Layer Active:
    Settings File: Found at C:\Users\idanr\AppData\Local\LunarG\vkconfig\override\vk_layer_settings.txt specified by VkConfig application override.
    Current Enables: None.
    Current Disables: VK_VALIDATION_FEATURE_DISABLE_THREAD_SAFETY_EXT.

    Objects: 1
        [0] 0x21ca7a112b0, type: 1, name: NULL
VUID_Undefined(WARN / SPEC): msgNum: 2044605652 - Validation Warning: [ VUID_Undefined ] Object 0: handle = 0x21ca7a112b0, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0x79de34d4 | Instance Extension VK_LUNARG_direct_driver_loading is not supported by this layer.  Using this extension may adversely affect validation results and/or produce undefined behavior.
    Objects: 1
        [0] 0x21ca7a112b0, type: 1, name: NULL
charles-lunarg commented 1 year ago

Everything you've mentioned here indicates that the problem is not with the loader, but with the AMD_switchable_graphics_layer being horribly out of date. 1.1.106 is from 2019 after all. The AMD graphics drivers listed are from 2020/2019 (difficult to pin down since 26.20 appears here but the trailing numbers do not).

The error log you provide terminator_EnumeratePhysicalDevices : Trimming device count from 2 to 1. shows that the layer is calling down with a size that is too small.

The https://github.com/KhronosGroup/Vulkan-Loader/commit/2f87e2b3a5578fba1563d8ac48df06dec3d9e18 causes the problematic layer to go from being disabled by default to enabled by default. The commit changed undesirable behavior for many reasons I wont get into here, and will not be reverted.

Vulkaninfo certainly could try to 'work around' the issue by allocating more space than is necessary, but that is a hack to work around a faulty layer.

Worse, any fix I could potentially make in the loader is not going to help anyone on a system with that faulty layer, because they would have to update the loader on the system. Updating the loader on windows is done by updating the graphics drivers (which install the loader). And the AMD switchable graphics layer should then be updated as well, making the problem go away, assuming the newer version of the layer fix this.

Digging into the layer's source code, it definitely appears that there have been changes to the EnumeratePhysicalDevices logic, which I assume means the issue has been fixed in the last 4 years since 1.1.106 released

06393993 commented 1 year ago

Thanks for the detailed reply, so the suggestion for the application is to:

Am I right?

charles-lunarg commented 1 year ago

Yes.

The crux of the problem is that this is an old version of the layer. Any fix I could make in the loader here and now doesn't automatically get installed onto systems with out of date components.