KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.73k stars 413 forks source link

VK_EXT_metal_objects2 for command buffers? #2149

Open MennoVink opened 7 months ago

MennoVink commented 7 months ago

Would it be possible to get an extension similar to VK_EXT_metal_objects but then also adding support for accessing the underlying metal command buffers?

I want to call a library function that injects commands into metal command buffers. I have a frame graph based setup where eventually everything synchronizes into an output based command buffer. I'd like to access this command buffer's metal backing object so that i can pass it along to this library. Currently i'm fencing the empty output command buffer and on fence signal i'm trying to manually allocate a metal command buffer by accessing the metal queue from the vulkan graphics queue. I'm getting crashes autoreleasing these custom allocated command buffers so i'm not sure yet if this can even work.

I'm using MVK_CONFIG_PREFILL_METAL_COMMAND_BUFFERS = 1. I need parallel command transcribing as well as no memory leaks. Maybe we need another option, or similar to VK_EXT_metal_objects when a command buffer is created containing the mtl export flag it could bypass the lazy command buffer creation?

billhollings commented 7 months ago

Interesting idea. Upon first review, this sounds fairly dangerous, and I'm not sure how we'd even sync this.

MTLCommandBuffers are transient objects, and are pooled in a fixed and relatively small pool by Metal. If we passed them out and the app retained them beyond their use in MoltenVK, it could jam all activity if the pool ran out of command buffers.

In addition, the MTLCommandBuffer is normally populated within MoltenVK during a vkQueueSubmit() command, so it's hard to see where the app would hook into this at the right place in order to add its content.

We do have the concept of prefilling MTLCommandBuffers during Vukan command entry, so it's possible that would be a place to do this. In general, prefilling slows things down in Metal, so there might be performance penalty to this. But at least it might be an option.

But even with that, there is the problem of MoltenVK and Metal recovering from the arbitrary changes in Metal encoder state that the external operations could apply (for instance if the external operations were run in the middle of a render pass).

And, is this level of inline intrusion even necessary? The function you posted seems to be just copying one texture to another, using either BLIT or rendering. Could you not retrieve the MTLTexture from Vulkan using VK_EXT_metal_objects, use your function to BLIT/ render to it using your own MTLCommandBuffer, and then depending on what you need to do, issue a Vulkan command to use that texture for further processing within MoltenVK?

Or of course, just replicate the function operation in Vulkan?

MennoVink commented 7 months ago

Interesting idea. Upon first review, this sounds fairly dangerous, and I'm not sure how we'd even sync this.

MTLCommandBuffers are transient objects, and are pooled in a fixed and relatively small pool by Metal. If we passed them out and the app retained them beyond their use in MoltenVK, it could jam all activity if the pool ran out of command buffers.

I would consider that to be application error and not something that MVK needs to worry about. It's the app's responsibility to configure MVK to request enough command buffers using the layer properties extension. In fact i think this risk is already present with exposing queues because applications can now access the queue and drain the command buffer pool by creating new command buffers from it and retaining those.

In addition, the MTLCommandBuffer is normally populated within MoltenVK during a vkQueueSubmit() command, so it's hard to see where the app would hook into this at the right place in order to add its content.

We do have the concept of prefilling MTLCommandBuffers during Vukan command entry, so it's possible that would be a place to do this. In general, prefilling slows things down in Metal, so there might be performance penalty to this. But at least it might be an option.

I'm imagining a new member of the VkExportMetalObjectTypeFlagBitsEXT enumeration. Then when creating a command buffer the application requests export support. By requesting this support MVK can override the MVK_CONFIG_PREFILL_METAL_COMMAND_BUFFERS setting to be value 2 or 3 for that specific command buffer. This should only reduce performance for those specific command buffers supporting api interop and not the ones only being used from Vulkan.

But even with that, there is the problem of MoltenVK and Metal recovering from the arbitrary changes in Metal encoder state that the external operations could apply (for instance if the external operations were run in the middle of a render pass).

I'm not sure i understand what you mean here (might be my lack of metal experience). How would the application make changes to MVK's encoder? If MVK exposes the command buffer, that would require the app to create their own encoder correct? Are you referring to the application writer being able to confuse himself by doing a vkBeginRenderPass first and then a MTLBlitCommandEncoder::endEncoding resulting in the blits happening before the render pass?

And, is this level of inline intrusion even necessary? The function you posted seems to be just copying one texture to another, using either BLIT or rendering. Could you not retrieve the MTLTexture from Vulkan using VK_EXT_metal_objects, use your function to BLIT/ render to it using your own MTLCommandBuffer, and then depending on what you need to do, issue a Vulkan command to use that texture for further processing within MoltenVK?

I am doing what you're suggesting currently but i have several concerns with it. Here's the code for reference:

@autoreleasepool
{
    id< MTLTexture > mtlTextureRef = VulkanMetalUtils::GetMetalTexture( image );
    id< MTLCommandQueue > graphicsQueue = VulkanMetalUtils::GetMetalQueue( *image.GetDevice().GetGraphicsQueueFamily()->GetQueue( 0 ) );
    id< MTLCommandBuffer > commandBufferRef = [graphicsQueue commandBuffer];

    [syphonServer publishFrameTexture:mtlTextureRef
                    onCommandBuffer:commandBufferRef
                           imageRegion:NSMakeRect( 0.0f, 0.0f, (float)image.GetFormat().GetWidth(), (float)image.GetFormat().GetHeight() )
                               flipped:true];
    [commandBufferRef commit];
}
  1. First of all this creates a new command buffer. I dont know if creating a command buffer is safe or if this could throw off some internal MVK live command buffer count tracking system of sorts.
  2. Even if creating a new command buffer is safe, it's a new resource being taken from the pool. I already have a command buffer intended for outputs so if i could inject these commands into that one it would save me a a resource.
  3. Currently i have a fence on the vulkan based output command buffer. Then when that fence is signaled i execute this code. This is cpu side sync which i want to be gpu side, there should be no reason or additional delay that the frame cannot be published as soon as the gpu is ready.
  4. Not really an unsolvable problem, but synchronizing this external command buffer back into my graphics engine is a pain at the moment. I have a circular buffer of frames fencing on that final output command buffer. When i'm recycling a frame i wait on that fence before i start reusing it's resources. Theoretically this external command buffer is not being waited on and thus rendering to it's source mtlTextureRef could start prior to this blit/render being done reading from it.
  5. I'm using MVK_CONFIG_SPECIALIZED_QUEUE_FAMILIES=1. I'd need to dig into MVK internals to figure out which queue i need to commit the command buffer to and hope that the queue submission order = execution order guarantee that metal provides will save me.

Or of course, just replicate the function operation in Vulkan?

Well yeah... but i prefer doing less work and not more ;) On top of all the engine plumbing i'd need to do to render to iosurfaces instead of vulkan images i'd also end up with a fork of that library which i'd need to maintain as they aren't active / accepting merge requests. Definitely a lot more work than uncommenting the 3 lines i have in the output command buffer recording right now ready to go if i'd have the MTLCommandBuffer

billhollings commented 7 months ago

this risk is already present with exposing queues

Yes...you're correct here.

I'm imagining a new member of the VkExportMetalObjectTypeFlagBitsEXT enumeration...

Yes. This would be a good way to do it.

I'm not sure i understand what you mean here (might be my lack of metal experience). How would the application make changes to MVK's encoder?

MoltenVK tracks render encoder state, to avoid calling MTLRenderpassCommandEncoder functions more often than needed. If the app grabs the MTLCommandBuffer in the middle of a render pass and starts submitting changes to the MTLRenderpassCommandEncoder, MoltenVK would have no way of knowing what the current state was. The app may have even ended the render pass by submitting a compute or BLIT command to the MTLCommandBuffer.

We'd likely have to have some kind of indication that the command buffer is in a dirty state, and on the next draw call, MoltenVK would have to re-establish the entire MTLRenderpassCommandEncoder state. I assume this would need to be done in a Vulkan command...maybe something like vkCmdMarkCommandBufferDirtyEXT() that the app would issue when it was done messing with the command buffer.

  1. First of all this creates a new command buffer. I dont know if creating a command buffer is safe or if this could throw off some internal MVK live command buffer count tracking system of sorts.

This is actually pretty safe, because MTLCommandBuffers are separate and submitted to the GPU in order of calls to [MTLCommandBuffer enqueue], which for MoltenVK happens at the beginning of vkQueueSubmit(). The only challenge the app would have is to call [MTLCommandBuffer enqueue] on the MTLCommandBuffer it creates either before or after vkQueueSubmit(), depending on what it's trying to do relative to the vkQueueSubmit().

See, for example, MVKQueue::waitIdle() which creates an arbitrary MTLCommandBuffer and commits it as a mechanism for knowing when everything submitted before has finished.

  1. Even if creating a new command buffer is safe, it's a new resource being taken from the pool. I already have a command buffer intended for outputs so if i could inject these commands into that one it would save me a a resource.

But, as you indicated above, by using pre-filled MTLCommandBuffers, that resource is going to be needed anyway. If you issue a vkQueueSubmit() with say 7 Vulkan command buffers and one in the middle is the one your messing with, then that vkQueueSubmit() will require 3 MTLCommandBuffers instead of the normal 1.

  1. Currently i have a fence on the vulkan based output command buffer. Then when that fence is signaled i execute this code. This is cpu side sync which i want to be gpu side, there should be no reason or additional delay that the frame cannot be published as soon as the gpu is ready.

You don't need to wait for GPU activity to finish to submit your [MTLCommandBuffer enqueue] call. For post-processing, you can do this immediately after the call to vkQueueSubmit() returns. Or if you're doing preprocessing, before calling vkQueueSubmit().

  1. I'm using MVK_CONFIG_SPECIALIZED_QUEUE_FAMILIES=1. I'd need to dig into MVK internals to figure out which queue i need to commit the command buffer to and hope that the queue submission order = execution order guarantee that metal provides will save me.

I assume you'd want to do this on the VkQueue on which you run your vkQueueSubmit(), wouldn't you? As I mention above, Metal guarantees that MTLCommandBuffers are executed on the queue in the order on which [MTLCommandBuffer enqueue] is called across al the MTLCommandBuffers created on that queue, regardless of when the MTLCommandBuffer is created or committed.