KhronosGroup / MoltenVK

MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Apache License 2.0
4.86k stars 430 forks source link

Request: support for VK_KHR_draw_indirect_count #168

Open HindrikStegenga opened 6 years ago

HindrikStegenga commented 6 years ago

As of Vulkan 1.1.76 the VK_KHR_draw_indirect_count extension was added to the spec. The extension allows for the gpu to set the instance counts. Could this be added to MoltenVK?

mellinoe commented 6 years ago

Metal doesn't really support this (afaik), so it would have to be emulated in a way that most likely does not match up with the intent of the extension. By that, I mean it would need to read the draw count back to the CPU and then emit individual indirect draw calls -- nothing you couldn't already do without the extension.

HindrikStegenga commented 6 years ago

In that case we will have to wait for Apple to add support for it in Metal.

oscarbg commented 6 years ago

Hi @Hindrik1997 @mellinoe please reopen.. seems Metal 2.1 in iOS 12 and Mac 10.14 adds indirect command buffers which is similar to VK_NVX_device_generated_commands in Vulkan i.e. the ability to generate render command buffers on GPU (for details see video or slides: https://developer.apple.com/videos/play/wwdc2018/607/) that functionality should also enable VK_KHR_draw_indirect_count since is more general than that extension, right?

oscarbg commented 6 years ago

On Twitter I said:

Metal 2.1 adds indirect command buffers guess is similar to VK_NVX_device_generated_commands in Vulkan.. slightly related don't know if GPU setting draw counts in GPU buffers similar VK_KHR_draw_indirect_count support was already in Metal 2.0 or needs new 2.1 ICB's..

and got this reponse form @zeuxcg on Twitter:

This was there before 2.0 - unsure what version, https://developer.apple.com/documentation/metal/mtlrendercommandencoder/1515467-drawprimitives …. That’s only a single draw call (instanced) - MDI is only since 2.1 AFAIK.

BeastLe9enD commented 3 years ago

is there any news regarding the support of VK_KHR_draw_indirect_count ?

MarloCraft commented 3 years ago

Is here anybody who tried implementing this? Or is here anybody who plans to implement this in future ? :)

alecazam commented 3 years ago

Metal's draw indirect api is missing a drawCount, and only provides an indirectBufferOffet. So you can't accumulate multi-material or multiple draws into a buffer, and then use the drawIndexPrimitive calls that take an indirectBuffer. An offset is not a range unlike the DX12 and Vulkan implementation which are actually functional.

fabiopolimeni commented 1 year ago

Metal's draw indirect api is missing a drawCount, and only provides an indirectBufferOffet. So you can't accumulate multi-material or multiple draws into a buffer, and then use the drawIndexPrimitive calls that take an indirectBuffer. An offset is not a range unlike the DX12 and Vulkan implementation which are actually functional.

@alecazam Isn't Indirect Arguments + Indirect Range Buffer sufficient for implementing VK_KHR_draw_indirect_count. It looks they are just made the same to me.

alecazam commented 1 year ago

IndirectArguments are nearly a copy of the MDI API on Metal with several big limiters. vertexOffset should be int32_t, but Apple got that wrong. There is also no stride support for storing a gl_drawID in the same MDI buffer.

The problem happens when you try to call it. You must have the cpu submit one draw call for each item in the buffer. There is no indirect count, but the gpu can overwrite the instance count in this buffer. So you would have to submit 0 instance draw calls from the cpu.

You are right that iIndirect Arguments is no more limiting than some Android implementations which don't have count or even have firstInstance support. The A9 where IA was introduced has both baseVertex and firstInstance support.

Apple would prefer that ICB (indirect command buffers) be used. But these can't be written by the GPU until A11. But those can have the cpu specify a range, and on A11 the gpu can specify the range. That "Indirect Range Buffer" is for ICB not MDI.

billhollings commented 1 year ago

Isn't Indirect Arguments + Indirect Range Buffer sufficient for implementing VK_KHR_draw_indirect_count. It looks they are just made the same to me.

I'm not sure what you are suggesting here. Can you expand on this? ICB's contain individual commands, that are otherwise available on the CPU API. Are you suggesting we could use the indirect count to unroll the one draw command into many repeated draw commands inserted into an ICB?

alecazam commented 1 year ago

You can implement MDI from Vulkan but without stride, indirect count, or count. The cpu would have to draw N elements out of the buffer, but the gpu can modify the buffer before it's referenced. The A9 which has the IndirectArguments support also has baseVertex and baseInstance (firstInstance) support. Those values then get passed onto the shader. This MDI struct is element for element the same as Vulkan, except Apple made vertexOffset a uint32_t when it should have been a signed int32_t.

So I gen a buffer with 10 instance indirect arguments in a buffer. Then in compute or SSBO I modify that buffer's instanceCount on each of the MDI elements. Then I call draw referencing the offset to each to the 10 arguments in the buffer. So some hold 0 instanceCount, and some hold the actual instanceCount.

Alternatively, the ICB's can draw a range and seems to be what Apple want's devs to use. They even added gpu support on A11. These can write out instanced draw commands from the cpu, and then draw a range from that buffer if it hasn't been compacted/optimized. So this would be for implementing something more like MultidrawIndirectCount. But fabio above was mixing these constructs together which doesn't work.

billhollings commented 1 year ago

Then I call draw referencing the offset to each to the 10 arguments in the buffer.

Is it the CPU or GPU making these multiple draw calls? If CPU, for vkCmdDrawIndirect() and vkCmdDrawIndexedIndirect(), MoltenVK already uses MTLDrawPrimitivesIndirectArguments and MTLDrawIndexedPrimitivesIndirectArguments and multiple draw calls from the CPU. But for vkCmdDrawIndirectCount() and vkCmdDrawIndexedIndirectCount(), the GPU would need to make these calls.

These can write out instanced draw commands from the cpu, and then draw a range from that buffer

Okay. That sounds like the command unrolling I mentioned above, and I agree, represents a mechanism that might work for vkCmdDrawIndirectCount() and vkCmdDrawIndexedIndirectCount(), at least for reasonable values of draw count.

Apple made vertexOffset a uint32_t when it should have been a signed int32_t.

Are you referring to Metal's MTLDrawIndexedPrimitivesIndirectArguments::baseVertex? If so, it's an int32_t, as in Vulkan's VkDrawIndexedIndirectCommand:: vertexOffset. Fortunately, the structs match across API's.

alecazam commented 1 year ago

Yes, it does seem like both vertexOffset and baseVertex are int32_t. When I looked at that last, I thought the docs were than baseVertex was a uint32_t. It's silly that Apple doesn't add stride or indirect count since they have AMD and Intel parts that have supported MDI since GCN. But if Apple Silicon doesn't have it, then they don't update the API.

I would avoid ICB for now, but start with the much more limited IndirectArguments that lines up with MDI on Vulkan. I'm using that path for now in my own iOS dev, but just wanted to point out the gotchas. I asked Apple to update it, and they just wanted me to use the ICB path since it's gotten gpu support. ICB can actually change argument buffers per draw for materials even on A9. But now on A13, there is argument buffer indexing that works with IndirectArguments or ICB.