KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.78k stars 466 forks source link

Guarantees for `VkPhysicalDevice` validity? #2141

Open Rua opened 1 year ago

Rua commented 1 year ago

The spec currently says this:

VkPhysicalDevice objects cannot be explicitly destroyed. Instead, they are implicitly destroyed when the VkInstance object they are retrieved from is destroyed.

This could be interpreted as providing certain guarantees for the validity of VkPhysicalDevice handles, but I'm not sure. Could the truth of these statements be clarified?

HansKristian-Work commented 1 year ago

Once a handle has been returned by an enumeration, it's always valid as long as the instance exists.

Yes. It is destroyed only on VkInstance destruction after all. (Theoretically, GPU hotplug could make this weird, but there is no mechanism to consider that). The only way the current API can work is that PhysicalDevice handles remain valid until instance is destroyed.

A lost device that is re-established is given back its old handle the next time physical devices are enumerated.

Not really, a VkDevice logical device is just a pointer. freeing and allocating memory may end up with the same pointer.

A new device never re-uses the handle of a previously lost device, but is always given a fresh handle.

Same as above. VkDevice pointers may be recycled, it's just a pointer.

Rua commented 1 year ago

Sorry, I was referring to the physical device that is behind the logical device. If the device is lost, I presume it's no longer returned in the next physical device enumeration. Does its VkPhysicalDevice handle recycled for other physical devices, or does it always refer to that physical device as long as the instance exists, even if the underlying physical device is gone?

HansKristian-Work commented 1 year ago

If the device is lost, I presume it's no longer returned in the next physical device enumeration.

Device lost can mean many thing, a page fault or GPU hang means device lost, and you can certainly just create a new VkDevice unless the kernel driver fails to recover for whatever reason.

If it's literally "GPU-was-joinked-out-of-PCIe-slot" lost, there is no mechanism to detect that and likely any further VkDevice creation will just fail.

Rua commented 1 year ago

Well, in that yoinked-out case, let's say I do these steps:

  1. Enumerate physical devices, which returns a handle for PCIe graphics card X.
  2. Pull X out of the slot.
  3. Enumerate physical devices again. Now no handle for X is returned, as one would expect?
  4. Put X back into the slot.
  5. Enumerate physical devices yet again. Now, a handle for X is returned again. Will that handle always be the same one that was returned in step 1?
HansKristian-Work commented 1 year ago

Enumerate physical devices again. Now no handle for X is returned, as one would expect?

That would break if you enumerate, GPU is yoinked and you create VkDevice.

That is more of a loader question, but I think VkPhysicalDevice is created on VkInstance creation time and it remains invariant. I doubt there is any consideration for hotplug here.

krOoze commented 1 year ago

There is a clause that says all outputs from Vulkan are constant unless stated otherwise. That seems would apply here as well.

gfxstrand commented 1 year ago

We really haven't even tried to solve hot-swapping in Vulkan so ruminating on "what happens if I pull out a GPU and plug it back in" isn't really useful. Something happens. :woman_shrugging: What's important is that, if you call vkGetEnumerateDevices(), the VkPhysicalDevice handle is valid until the VkInstance is destroyed. If the device has gone missing, vkCreateDevice() may fail but you're welcome to try.

It's also not something that's currently worth an app developer's time to worry about. If the user physically unplugs their GPU while your app is running on it and your app crashes, that's kinda their fault. Unless you're the OS compositor responsible for running their entire desktop, you don't really need to worry about these things. If you are the OS compositor, then you're on Linux which doesn't support GPU hot-swap in the kernel anyway so you there are bigger problems.

krOoze commented 1 year ago

@gfxstrand Arguably, no yank needed. Same would apply for driver update\reinstall. #1319

cubanismo commented 1 year ago

Agreed, and driver reinstall (Windows style) is the best case I've seen for supporting hotplug in the Vulkan API.

Nothing much else to add on this exact issue, but some prior art discussion: There's some precedent, for better or for worse, of how to handle hotplug of objects using "static" handles like this in VK_KHR_display with its extensions. There's a special fence (I regret using fences for this, but it was early days) that signals when the list of displays changes. At that point, you re-query them. This is racy, but the idea is it settles in a steady state if you always re-arm the fence before querying:

  fence = SetupDisplayHotplugFence();
  vkWaitForFences(&fence, BIG_TIMEOUT);
  do {
     vkResetFences(&fence); // Might be an explicit re-arm needed here, I forget
     vkGetPhysicalDeviceDisplayPropertiesKHR(&propertyArray);
  } while (vkWaitForFences(&fence, 0 /* Immediate timeout */) != TIMED_OUT);
  // You've reached a temporary steady state.

I think the handles remain valid indefinitely, but the object backing them at the OS or HW level can become invalid, and in that case they won't be returned again when you re-enumerate displays. I forget whether it's well-defined what happens when you try to use them again at that point. VkDisplayModeKHR objects have similar lifetime issues, and its been proposed they should be destroyable, since they are (optionally) creatable.

oddhack commented 8 months ago

(edit: @versalinyaa) according to the meeting notes you had signed up to propose a PR, so assigning you (unless I misunderstood something at the time).

@gfxstrand please ignore, made a typo in initial assignment.