Open brianpaul opened 3 years ago
/build/vulkan-tools-1.1.130.0~rc1/vulkaninfo/vulkaninfo.h:368: failed with ERROR_OUT_OF_HOST_MEMORY That implies the vulkaninfo version used is 1.1.130, which is missing some of the changes I've made to vulkaninfo since then. Not to mention running a newer loader might solve the problem as well.
Is the AMD switchable graphics layer present at all? Maybe thats what originally returning the error.
Vulkaninfo generally throws its hands up in the air if any vulkan function fails, because if it did continue, it might hard crash later or report incorrect information.
Sorry, I don't know what the "AMD switchable graphics layer" is.
I'm using the latest vulkan loader and tools trees. The error message is an example. It's the same with the latest code.
I understand vulkaninfo throwing up its hands if some things fail, but I've hacked the code so that the VK_ERROR_OUT_OF_MEMORY I described above is not special-cased by the loader and then it works as I'd expect.
I guess should have probably filed this issue with the loader and not tools.
Ah now that I see you were referring to loader code, rather than vulkaninfo code, the changes you made make sense.
Looking at the loader logic there, I think the 'bail on OUT_OF_HOST_MEMORY ' (OOHM) is intended, as that error is used to signal that malloc has failed, and if the driver can't do what it needs to and returns OOHM, then neither the loader can. A driver returning INITIALIZATION_FAILED (INIT_FAILED) then being skipped over is consistent since it means that specific driver didn't succeed (and we should remove it from the list of enabled drivers) and then try to load the other drivers on the system.
If AMDVLK is indeed returning OOHM when it should be returning INIT_FAILED, then vulkaninfo shouldn't be affected. Though, if AMDVLK returns INIT_FAILED but the intel drivers aren't reported, then something else is amiss.
Yeah, I think the root bug may be in the AMDVLK driver and I've reported it to them. But I have a hunch they're going to say that it's a loader bug.
IMHO, it's seems very unlikely that the driver would really run out of host memory during vkCreateInstance. "host memory" here means ordinary heap memory in the process, right?
Yes, Host memory should refer to regular malloc'd memory. Also yes, the driver really shouldn't be returning OOHM, as its pretty darn rare in practice, especially with virtual memory in the mix. BUT this wouldn't be the first time drivers or the loader returned the wrong error code, so it doesn't surprise me if that did happen.
I have the similar issue, but with a different error.
$ lspci|grep VGA
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Desktop)
01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] (rev a1)
$ vulkaninfo
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
ERROR at /build/vulkan-tools/src/Vulkan-Tools-1.2.172/vulkaninfo/vulkaninfo.h:248:vkGetPhysicalDeviceSurfaceFormats2KHR failed with ERROR_INITIALIZATION_FAILED
$ VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nvidia_icd.json vulkaninfo
ERROR at /build/vulkan-tools/src/Vulkan-Tools-1.2.172/vulkaninfo/vulkaninfo.h:248:vkGetPhysicalDeviceSurfaceFormats2KHR failed with ERROR_INITIALIZATION_FAILED
Arch Linux vulkan-icd-loader 1.2.172-1 vulkan-intel 20.3.4-3 nvidia 460.67-2 nvidia-utils 460.67-1 vulkan-tools 1.2.172-1 linux 5.11.8.arch1-1
@H5117 The Seems I was looking at the wrong GPU, the 208 indeed does support vulkan, I was looking at the 108 which doesn't. [GF 108]GeForce GT 730
does not support vulkanvulkaninfo
requires at least one valid GPU to run. Except, the vulkan-loader is responsible for finding 'valid vulkan drivers' on the system. It seems that it considers the nvidia driver to be valid, which then this driver returns a valid VkPhysicalDevice
, that vulkaninfo
can use. vkGetPhysicalDeviceSurfaceFormats2KHR
is crashing when using this physical device.
Can you set the env-var VK_LOADER_DEBUG=all
, run vulkaninfo again, and return the output generated?
@charles-lunarg Here is the output: vulkaninfo.txt.
vkcube
also works only with explicit selection of the Intel GPU. And segfaults by default:
$ vkcube
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
Selected GPU 1: NVIDIA GeForce GT 730, type: 2
Can't find our preferred formats... Falling back to first exposed format. Rendering may be incorrect.
Segmentation fault (core dumped)
This looks more and more like an issue with the driver. The Nvidia driver should either: not be found because it doesn't support vulkan or not report support for any physical devices. However I cannot rule out the possibility that a loader bug is causing this issue. But generally speaking, only SDK versions of the loader & tooling is validated. Using individual header updates means you are liable to include bugs that were introduced but fixed during SDK. Can you update to 1.2.176 and rerun the code?
The "NVIDIA Corporation GK208B [GeForce GT 730]" device should support Vulkan. I would be interested in seeing the callstack for the crash.
The same behavior with vulkan-icd-loader 1.2.176-1 and vulkan-tools 1.2.176-1.
Maybe it is worth to note that I don't have a monitor attached to the Nvidia card, it is used as OpenCL device only. But IMHO vulkaninfo
should work in this case, and vkcube
should not crash.
In this case, vkcube is crashing because a call to vkGetPhysicalDeviceSurfaceFormatsKHR
is returning a non-success value, which indicates that something related to the surface isn't working. So its less crashing and more just failing an assert.
https://github.com/KhronosGroup/Vulkan-Tools/blob/6149e30699b36901715d46a5cef8959625ef399b/cube/cube.c#L3685
I do agree that the error reporting could be better, but I assert (heh) that vkcube did what it could to verify that the system can support surfaces (by verifying if VK_KHR_surface and the platform specific surface extension are present and enabled), and then attempted to query the surface info (formats, support, capabilities, etc) and thats when it failed.
I am not the vkcube maintainer, so my experience with that codebase is limited, as such it is very feasible that vkcube could be doing more to ensure that it works.
As for vulkaninfo, that definitely is an issue, vulkaninfo should be more resilient to faults. Though, if there is an issue where the vulkan-loader reports support for surface extensions but crashes in calls to them (ie what vkcube could be suffering from), then vulkaninfo has the same limitation of only being able to check for those extensions to determine support.
The spec declares surface must be supported by physicalDevice, as reported by vkGetPhysicalDeviceSurfaceSupportKHR or an equivalent platform-specific mechanism
and at this point vulkaninfo
has in-fact not called vkGetPhysicalDeviceSurfaceSupportKHR
.
If we do attempt to call vkGetPhysicalDeviceSurfaceSupportKHR
for every queue when using nvidia's drivers we will find that nvidia is happy to report that present is not supported on any queue for some surface types. These are the surfaces for which errors are reported where vulkaninfo doesnt expect them.
I would imagine that this makes vkcube successfully presenting frames of this surface on a queue out of spec but I dont pretend to know the infinite wisdom of the spec authors and nvidia engineers. It seems vkcube
uses a surface type chosen at compile time which nvidia does support present for and gives up complaining it couldnt find appropriate queues if you change to the troublesome surface type. If we instead give up on querying PhysicalDeviceSurface information in AppSurface
if vkGetPhysicalDeviceSurfaceSupportKHR
returns false for queue 0 (or maybe all of them) everything else completes successfully.
I have a multi-GPU setup (built-in Intel GPU, external GPU enclosure with AMD). If both the Intel and AMD GPUs are available (powered on, kernel modules loaded, etc), vulkaninfo works as expected, printing details of both GPUs.
However, if the external GPU is not available, vulkaninfo exits early with an error:
/build/vulkan-tools-1.1.130.0~rc1/vulkaninfo/vulkaninfo.h:368: failed with ERROR_OUT_OF_HOST_MEMORY
No info about the available Intel GPU is printed.
The problem is caused by two issues:
I looked at commit 7fc1edea087f77c165fdfad060bc07481526b39e but it's not clear to me why VK_ERROR_OUT_OF_MEMORY is handled specially. My issue is fixed if I simply don't check for VK_ERROR_OUT_OF_MEMORY.