KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.8k stars 468 forks source link

Separate queue creation from logical device creation #1797

Open haasn opened 2 years ago

haasn commented 2 years ago

In some cases (e.g. when sharing a single VkDevice between multiple libraries, some of which have internal threading), it can be needlessly difficult to negotiate the creation of a single shared set of VkQueues, especially because these queues are externally synchronized.

In this use case I think it would be very useful to allow creating VkQueues after device creation is finished, e.g. by initializing the device with 0 queues and then using a separate dedicated function like vkCreateQueueSet which essentially takes the same parameters as the queue selection options in vkCreateDevice. That way, each library can create its own VkQueueSet which would functionally act like queues belonging to separate logical devices (but still constitute a single logical device, with shared state including allocated memory and GPU resources).

Right now, the only way to accomplish something like this is to create two logical devices instead - one per library - but then there's no (cross-platform) way to share memory or pass images between these devices.

Alternatively, there could be a device feature to make VkQueue implicitly synchronized. That would also get around this mess. I really don't understand why they're marked as externally synchronized when you only get one set per device, and most other things at the per-device level are implicitly synchronized...

haasn commented 2 years ago

Noted similarities to #1320, which may end up solving this issue en passant.

cyanreg commented 2 years ago

We're running into this issue with FFmpeg when trying to use the video decode extension. In FFmpeg, and most libavcodec API users, decoding happens in separate thread(s), whilst downloading images (or presenting them) happens in a different thread. Queues are something we don't expose to the API, as each user can just call vkGetDeviceQueue on their own. But VkQueues are externally synchronized objects, so we need less than pretty ways of synchronizing access to them between different API users.

Would be a lot simpler if the implementation could be asked to take care of synchronizing them instead.

TomOlson commented 2 years ago

@haasn @cyanreg thanks for this input - your timing is good, as someone in the Vulkan WG has just been arguing for something like this. It's just an idea at this point and we haven't committed to doing it, but knowing that there are people outside Khronos who could use it affects the priority considerably. Implementing this in drivers, conformance, et cetera is likely to be a lot of work, so it won't happen very soon. We'll discuss further internally and may be back with questions about your use cases. Thanks!

krOoze commented 2 years ago

Isn't the right way™ to just create private logical device and use external memory to transfer resources?

Alternatively, there could be a device feature to make VkQueue implicitly synchronized.

If you want implicitly synchronized queue, you can implement it yourself without creeping stuff inside the driver.

Even if you have no reasonable way to distribute data in the library, you could always make use of VK_EXT_private_data. Put a mutex in the private data, and voila: we have synchronized queue.

haasn commented 2 years ago

Isn't the right way™ to just create private logical device and use external memory to transfer resources?

This is not cross-platform, unfortunately. So this is a route we are not willing to go except as an absolute last resort. (And for the time being, sharing mutexes is easier to do cross-platform than sharing vulkan memory directly)

VK_EXT_private_data

This is an interesting possibility. I'll look into it.

haasn commented 2 years ago

This is not cross-platform, unfortunately. So this is a route we are not willing to go except as an absolute last resort.

Incidentally, a very easy fix for this could be an extension introducing respectively VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_POINTER_BIT and VK_EXTERNAL_SEMAPHORE_HANDLE_TYPE_OPAQUE_POINTER_BIT, which would be backed by a void * payload.

This would have the advantage of being cross-platform (and avoiding the many pitfalls associated with dealing with abstract file descriptors) but the disadvantage of having to remain within a single address space. But, that perfectly solves the use case of shared (or statically linked) libraries wanting to share resources directly. I would actually very much prefer this solution, coupled with letting each library have its own logical device.

It would also simplify a lot of other things we currently have to make sure to manually synchronize between the devices, such as device features and enabled extensions.