KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.81k stars 425 forks source link

Support for VK_EXT_descriptor_buffer #1776

Open spnda opened 1 year ago

spnda commented 1 year ago

VK_EXT_descriptor_buffer is a new extension that released today. It essentially allows the device to manage the memory backing descriptor sets. This is more or less exactly what MoltenVK is already doing, so implementing this extension should not be too hard.

In short, this extension allows to query the offset and size of a descriptor within a descriptor layout. We then have to write the BDAs and handles into a buffer (ideally DEVICE_LOCAL) with these offsets and sizes. This buffer is then bound using a BDA in a command buffer, with byte offsets specifiable later. This is effectively the same as argument buffers with Metal 3, where we also just manually write handles and pointers into memory. The only difference is that handles in Metal are always sized the same unlike in Vulkan, where one has to query the size from the driver. The SPIR-V descriptor interface remains the same.

I'd like to implement this extension in the coming days. @billhollings Could you please assign me?

billhollings commented 1 year ago

I agree that VK_EXT_descriptor_buffer sounds like is a good mapping for Metal3 argument buffers.

I'd be happy to have you work on this. Thanks very much for volunteering! And as requested, I've assigned it to you.

The changes to argument buffers in Metal3 should also allow us to upgrade support for argument buffers in general, and in particular for descriptor indexing. With that in mind, it would definitely be preferable to integrate the two uses of Metal argument buffers, and not fragment their use into two paths.

For instance, if Metal3 is available on the platform, maybe always enable the use of argument buffers, and in a manner that allows it to support general descriptor sets (eg. one Metal argument buffer per descriptor set), descriptor indexing, and descriptor buffers, ideally all using a single implementation of Metal argument buffers in MoltenVK.

I haven't had time to look into this in any detail yet, so I may be asking for too much. But I'd like to see it as a design intention, as you look into VK_EXT_descriptor_buffer.

We can use this thread for contributors to discuss design ideas for doing all this.

PS: @spnda I've granted you write access to the repo now. That should allow you to assign to tasks, and ask for reviewers on PR's. Please leave PR merging with me, though.

spnda commented 1 year ago

So, I've come across a few issues trying to implement this extension. It's been fairly simple because the extension effectively provides a MTLArgumentEncoder interface, but the following two problems are making it quite complicated.

First of all, Metal does not have a function to bind a buffer by its address. Therefore I would have to track which buffers have been created with VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT and store those in a map or array. I would then in the encoding of vkCmdBindDescriptorBuffersEXT search that map/array by the buffer's address (which might have an offset) and then bind that buffer. Not too sure how I would implement this nicely, but this should be doable.

The major issue I've just stumbled upon is that by giving the application control of writing the argument buffers we have no idea which resources are bound. As far as I know, Metal requires resources to be made resident with useResource or useHeap, meaning I would need to traverse the bound descriptor buffer every time the offsets change with vkCmdSetDesctiptorBufferOffsetsEXT and read all the handles and pointers from it, then lookup the corresponding MTLTexture, MTLBuffer, MTLSamplerState and then call useResource for those handles. This would be a very complicated and error-prone thing to implement, would likely also be quite slow and most importantly defeat the purpose of this extension at the same time. I don't feel like we should support any extensions when we have to do something like this. Though in theory this would still be possible, so how would you feel about this?

I've pushed my changes until now onto a branch: https://github.com/spnda/MoltenVK/tree/EXT_descriptor_buffer

zmarlon commented 1 year ago

In my opinion, it would make sense to make a feature request for Metal at Apple. Because I don't think it makes sense to implement the extension in such a way.

psionic12 commented 1 year ago

I also don't understand why vkCmdBindDescriptorBuffersEXT takes addresses instead of buffer handles, do you guys know why? I asked a question, but nobody answers for now.

rcaridade145 commented 1 year ago

https://github.com/KhronosGroup/Vulkan-Docs/issues/2028#issuecomment-1375711855

@psionic12 I believe this is the code https://github.com/HansKristian-Work/vkd3d-proton/blob/2dc906ee3b78c5aedd863812f7739bf1c2e1f54e/libs/vkd3d/va_map.c

corporateshark commented 1 year ago

First of all, Metal does not have a function to bind a buffer by its address. Therefore I would have to track which buffers have been created with VK_BUFFER_USAGE_RESOURCE_DESCRIPTOR_BUFFER_BIT_EXT and store those in a map or array. I would then in the encoding of vkCmdBindDescriptorBuffersEXT search that map/array by the buffer's address (which might have an offset) and then bind that buffer. Not too sure how I would implement this nicely, but this should be doable.

Can gpuResourceID from Metal 3 help with this? https://developer.apple.com/documentation/metal/mtlrenderpipelinestate/3974100-gpuresourceid

rcaridade145 commented 1 year ago

Wonder if this helps - https://developer.apple.com/documentation/metal/buffers/managing_groups_of_resources_with_argument_buffers

SwampyApple commented 1 year ago

In my opinion, it would make sense to make a feature request for Metal at Apple. Because I don't think it makes sense to implement the extension in such a way.

As I've mentioned in the general Metal 3 features discussion I also feel like this is a good step to take. The people over at CodeWeavers got in touch with Apple over some Rosetta 2 bugs which got fixed. They've also added a raw image mode for the Asahi Linux development, instead of needing Mach-O kernel files.

rcaridade145 commented 1 year ago

@billhollings @spnda

Was looking for info that could help here. Came across this when useHeap /useResource.

https://forums.kodeco.com/t/insights-from-wwdc/142554

https://forums.kodeco.com/t/when-to-call-useresource/134089

K0bin commented 1 year ago

The way to make this work would be:

Same thing should probably be done for VK_EXT_descriptor_indexing too, so it can properly support PARTIALLY_BOUND and huge descriptor sets.

QuantumDeveloper commented 1 year ago

Is there any chance that this extension will be implemented in the nearest future?

billhollings commented 1 year ago

Is there any chance that this extension will be implemented in the nearest future?

Soon. It's in the upcoming development roadmap.

spnda commented 1 year ago

@billhollings Are you going to take over on this PR? I'm still technically free to work on this. There are solutions to make the feature work fully, but they would be quite broad changes that would take time.

Specifically, we would most likely need to refactor VkDeviceMemory to not map to a MTLHeap and essentially always allocate, say, 1MB MTLHeap blocks and then sub-allocate VkDeviceMemory and VkBuffer from those. We then always have a useHeap call so that all resources are available, which would fully fix the residency issue and allow the the GPU to write descriptors.

The issue that the Vulkan API binds by an address plus offset can probably be easily solved. So for this extension there's a lot of things having to be changed, and I also don't know if you're fine with the MTLHeap change before I actually start implementing that.

billhollings commented 1 year ago

Are you going to take over on this PR? I'm still technically free to work on this. There are solutions to make the feature work fully, but they would be quite broad changes that would take time.

I am intending to work on that as part of the roadmap (#1975). But as you can see, it's further down on the priority list. I am definitely happy to have you continue to work on it, instead.

I recommend you/we wait until I complete the first item on the roadmap, the Metal 3 argument buffers. As part of that, I am trying to improve how the resources are bound and made resident. I'm sure there will be further improvements when work proceeds on VK_EXT_descriptor_buffer here too, but that will give us a start.

spnda commented 9 months ago

I recommend you/we wait until I complete the first item on the roadmap, the Metal 3 argument buffers. As part of that, I am trying to improve how the resources are bound and made resident. I'm sure there will be further improvements when work proceeds on VK_EXT_descriptor_buffer here too, but that will give us a start.

@billhollings Has there been any updates regarding this? I can imagine the descriptor capabilities to be drastically improved when defaulting to the Metal 3 argument buffers and only falling back to using Metal 2 or Metal 1 functionality if absolutely necessary. I've recently seen a lot of people confused (on Discord) about the environment variable to enable the use of argument buffers. I'm also still interested in supporting descriptor buffer, and I saw that the KHR_acceleration_structure PR already addressed one of my concerns, and I think handling residency of resources could also be somewhat addressed with a revamped descriptor structure.

billhollings commented 9 months ago

Has there been any updates regarding this?

Still on my ToDo list. Hoping to get to it within a month.