Closed wolfpld closed 2 months ago
Obligatory question... do you have the nvidia_drm module loaded with parameter "modeset=1"? You can check by reading /sys/module/nvidia_drm/parameters/modeset.
No, the value I get is N
. When I enable it (the value becomes Y
), then the vkCreateSwapchainKHR
call succeeds.
I'd say this is highly unintuitive, as I would assume that in my case the mode setting is done by the AMD driver, not Nvidia's.
I'd say this is highly unintuitive
Yes, it is. Originally that parameter was just for modesetting functionality but over the years we've added other things that require it. Eventually it will become the default, but currently is can cause problems for some workstation SLI configurations.
Ok, so the driver uses DRM both for mode setting and for transferring images with PRIME, but naming things is hard and changing already established conventions will break things. That's understandable.
The problem is that the vkGetPhysicalDeviceSurfaceSupportKHR
call tells that the driver is able to present on the surface, even if it isn't. Applications will typically implement some kind of GPU ranking system to select the best available GPU, and Nvidia will often win in this ranking.
The end result is that applications fail with a cryptic error that the documentation says shouldn't happen. The application could have used the other GPU instead if the Nvidia driver had told it correctly that it could not render on the surface provided.
That's a fair point. It should be possible for us to detect whether modeset is enabled during device initialization and only advertise support for Wayland surfaces if so. I've filed an internal bug to implement that.
Any update here?
Additionally, is there any way for an application developer to detect that this will occur, so we can skip over the nVidia Vulkan device and pick a different one? We can't simply read /sys/module/nvidia_drm/parameters/modeset
, as that requires root privileges. I could check lsmod
output to see if nvidia_drm
is in the list, but that doesn't tell us anything about whether modesetting is enabled.
In the next major release, 555, we will not advertise support for Wayland surfaces when nvidia-drm is not loaded with modeset=1
Fantastic; glad to hear it!
Until that goes live and is adopted by distributions (I'm expecting it could take quite a while to come to Ubuntu 20.04, for example), how can we detect and/or work around this? I'd rather not blanket skip over any nVidia device when using Wayland.
Could you just have your application try a different device if swapchain creation fails?
Yeah, fair - the library we're using on top of Vulkan doesn't make this possible at the moment, but I can address the issue at that level. Thanks!
For interest's sake, internally we use a vendor-specific DRM ioctl to determine whether modeset=1
is set. Specifically, the supports_alloc
field of DRM_IOCTL_NVIDIA_GET_DEV_INFO
whose implementation you can find here https://github.com/NVIDIA/open-gpu-kernel-modules/blob/476bd34534a9389eedff73464d3f2fa5912f09ae/kernel-open/nvidia-drm/nvidia-drm-drv.c#L744
I would strongly discourage external applications or libraries from using that, though, since the interface is not guaranteed to be stable between driver versions. That's not an issue for us only because our user-space components and our kernel modules are version-locked.
Nvidia driver 545.29.02 fails with
VK_ERROR_INITIALIZATION_FAILED
when trying to create a swapchain. I do not believe there are any conditions listed in the documentation that would allow such a return value.Please see the minimal (sigh) example below to reproduce the issue. The example follows the minimal path required to print the
vkCreateSwapchainKHR
return value and then exits. Physical devices are listed when the example is run, and you have to select one of them as the first parameter of the executable.The example does the following:
wl_compositor
and use it to create awl_surface
.VkInstance
with theVK_KHR_surface
andVK_KHR_wayland_surface
instance extensions enabled.VkSurfaceKHR
withvkCreateWaylandSurfaceKHR
, using thewl_surface
obtained earlier.VkPhysicalDevice
selection is made.VK_KHR_swapchain
device extension, which is checked.VkSurfaceKHR
, which is checked withvkGetPhysicalDeviceSurfaceSupportKHR
.VkDevice
is created.VkSwapchainKHR
is created.My machine has two GPUs, an integrated AMD GPU that drives the display, and a dedicated Nvidia GPU.
This is the result of running with the Nvidia GPU:
And this is with the two remaining devices:
The example program follows: