KhronosGroup / Vulkan-Loader

Vulkan Loader
https://vulkan.lunarg.com/doc/sdk/latest/linux/LoaderInterfaceArchitecture.html
Other
505 stars 273 forks source link

terminator_CreateDisplayPlaneSurfaceKHR() with multiple ICDs #1541

Open jjulianoatnv opened 3 weeks ago

jjulianoatnv commented 3 weeks ago

By code inspection of terminator_CreateDisplayPlaneSurfaceKHR(), it appears that this function may not function as intended when multiple ICDs that implement icd_term->dispatch.CreateDisplayPlaneSurfaceKHR.

Around here: https://github.com/KhronosGroup/Vulkan-Loader/blob/438875314dc838d130098127c6e908798500fecc/loader/wsi.c#L2031

is code that loops over all ICDs. It dispatches vkCreateDisplayPlaneSurfaceKHR to every ICD that implements icd_term->dispatch.CreateDisplayPlaneSurfaceKHR. If any of the ICDs returns a VkResult other than VK_SUCCESS, then the function early-exits with that VkResult (with that error).

The displayMode member of the VkDisplaySurfaceCreateInfoKHR structure is a VkDisplayModeKHR. A VkDisplayModeKHR is a non-dispatchable handle with parent type VkDisplayKHR. A VkDisplayKHR is a non-dispatchable handle with parent type VkPhysicalDevice. VkPhysicalDevice identifies one ICD.

I had expected the loader to dispatch vkCreateDisplayPlaneSurfaceKHR to the one ICD that exposed the indirectly-referenced VkPhysicalDevice , and not to dispatch it to any other ICDs. That is not how the loader behaves. As written, when the loader vkCreateDisplayPlaneSurfaceKHR to an ICD that did not expose the indirectly-referenced VkPhysicalDevice, then I expect that other ICD to return a VkResult that is not VK_SUCCESS. This will cause terminator_CreateDisplayPlaneSurfaceKHR() to early exit and to return that same error VkResult.

If the loader does not know the relationship VkDisplayModeKHR -> VkDisplayKHR -> VkPhysicalDevice due to the loader not tracking the objects VkDisplayModeKHR and VkDisplayKHR, then perhaps a reasonable fix is the following:

jjulianoatnv commented 3 weeks ago

@cubanismo , I'd like your opinion on this.

jjulianoatnv commented 3 weeks ago

After thinking about this more, I've come to the conclusion that vkCreateDisplayPlaneSurfaceKHR is probably simply broken. All other command that operate on VkDisplayKHR and on VkDisplayModeKHR also have a vkPhysicalDeviceKHR parameter. This one does not. I think it's a bug that this one does not have a physical device parameter.

cubanismo commented 3 weeks ago

After thinking about this more, I've come to the conclusion that vkCreateDisplayPlaneSurfaceKHR is probably simply broken. All other command that operate on VkDisplayKHR and on VkDisplayModeKHR also have a vkPhysicalDeviceKHR parameter. This one does not. I think it's a bug that this one does not have a physical device parameter.

It's not the function that's broken, it's the general implementation of surfaces in the loader. Surface creation isn't supposed to dispatch to any drivers. It's just supposed to create a container struct for the parameters that is then passed to drivers in subsequent commands that do always dispatch properly based on other objects. Surfaces are driver-agnostic. My opinion is that any driver that's relying on this behavior is broken, and the loader dispatching of surface creation/destruction to drivers should be ripped out.

jjulianoatnv commented 3 weeks ago

OK, yes, that seems like another viable solution path.

It looks like every command that consumes a surface takes a physical device parameter. These commands dispatch to one ICD.

The various surface factories each have their own surface createInfo. I do not see any createInfo that contains a structure member which needs to be unpacked by the loader. If any of these createInfo do have a structure member that needs to be unpacked by the loader, then the system is broken and your suggested fix will not succeed for that surface type.

jjulianoatnv commented 3 weeks ago

Wait...on second thought, if the surface factories stop dispatching into ICDs, then the loader/ICD interface will have to be changed. Every command that does dispatch to an ICD and that takes a surface parameter will have to be modified to pass into the ICD the createInfo struct that was used to create that surface.

This does seem viable. But it's not an easy change as it breaks backwards compatibility of the current loader/ICD interface.

pdaniell-nv commented 3 weeks ago

A long time ago the loader handled the VkSurfaceKHR objects itself, but that proved to be weird for ICDs to deal with as it was unusual and it wasn't possible to include driver internals with those objects. ICD interface version 3 (https://github.com/KhronosGroup/Vulkan-Loader/blob/main/docs/LoaderDriverInterface.md#loader-and-driver-interface-version-3-requirements) added support to allow ICD to implement their own VkSurfaceKHR handling. I believe the loader still supports doing its own VkSurfaceKHR if the ICD doesn't export vkCreateWin32SurfaceKHR and the others.

cubanismo commented 3 weeks ago

and it wasn't possible to include driver internals with those objects.

Right, that's the point. They were never supposed to.

If any of these createInfo do have a structure member that needs to be unpacked by the loader, then the system is broken and your suggested fix will not succeed for that surface type.

I don't see why that follows. VkSurface is an instance-level object. The loader should parse the createInfo and its pNext chain and populate some internal representation of a VkSurface object using that data. Then when dispatching commands to ICDs that include a VkSurface, it's supposed to pass that internal structure as part of the ABI. That was the intended design for VkSurface when it was introduced as a "typesafe" version of the original APIs that just cast a surface-type-dependent non-opaque struct to void* or simklar as a parameter to vkCreateSwapchainKHR().

jjulianoatnv commented 3 weeks ago

The loader should parse the createInfo and its pNext chain and populate some internal representation of a VkSurface object using that data. Then when dispatching commands to ICDs that include a VkSurface, it's supposed to pass that internal structure as part of the ABI.

OK, I think I agree that this suggestion is viable. I also agree that what you describe was the original loader design, and that implementation and interface changed in a later revision. I was not involved in discussion that lead to these changes, and I do not know why they were made or if these changes were the "best" option at the time.

Scanning other terminators in the loader, I see that terminator_CreateDevice() behaves similar to what we are describing. Below this comment:

// Before we continue, If KHX_device_group is the list of enabled and viable extensions, then we then need to look for the
// corresponding VkDeviceGroupDeviceCreateInfo struct in the device list and replace all the physical device values (which

// are really loader physical device terminator values) with the ICD versions.

is a loop that scans the VkDeviceCreateInfo pNext chain and replaces loader physical device terminator values with ICD physical device handles. The code is here: https://github.com/KhronosGroup/Vulkan-Loader/blob/438875314dc838d130098127c6e908798500fecc/loader/loader.c#L5813

If surfaces are modified to behave as you suggest, then during any command that takes a surface as input parameter, the loader will have to do similar substitution to replace all loader-wrapped terminator values in the *SurfaceCreateInfo structs and their pNext chains with corresponding ICD handles.

What we have instead is that the loader dispatches surface factory commands to ICDs, and the ICDs create ICD-internal surface objects. Later, the loader skips doing handle replacement during execution of commands that take a surface as input parameter. But the loader's current implementation is broken because it fails when more than one ICD implements a function such as vkCreateDisplayPlaneSurfaceKHR.