Add support for VK_EXT_graphics_pipeline_library

billhollings commented 1 year ago

From user request:

I guess the first question is what state dependencies there are for doing the SPIR-V -> MSL conversion and whether VK_EXT_graphics_pipeline_library would give you everything you need. I guess on macOS to generate the AIR you still need to do this offline or it is done at PSO build time? So I guess this would only save the SPIR-V -> MSL conversion step but still that would be a pretty big win. Ideally we’d get all the way to AIR but AFAIK there is still not an API interface to do that on Metal and has to be done offline?

billhollings commented 1 year ago

I've done an initial review of the feasibility of implementing VK_EXT_graphics_pipeline_library in MoltenVK's use of SPIRV-Cross and Metal.

This will require large amount of refactoring of MVKGraphicsPipeline. This will be a significant amount of work, and since MVKGraphicsPipeline is complicated, there is a significant potential for introducing subtle regression errors.
Each rasterizing Metal graphics pipeline needs both pre-rasterization and fragment shaders. Because of this, final Metal pipelines will always need to be compiled when linked, and so graphicsPipelineLibraryFastLinking will return false.

One of the primary goals of VK_EXT_graphics_pipeline_library will be to convert and compile pre-rasterization shaders and fragment shaders independently of each other, and only once, without reference to vertex input state, and then link them together into various runnable Metal pipelines without reconverting and recompiling the shaders. Currently, the conversion of SPIR-V shaders to compiled Metal shaders are tied to other pipeline library definitions as follows:

For pre-rasterization shaders, Metal requires vertex buffer values to be declared with correct sign and vector size. We use vertex attributes from the vertex input stage to tell the shader the format to indicate both of these. We could instead check these during final Metal pipeline linking, and trigger an automatic shader reconversion and recompile if there is a mismatch. These additional compiled Metal shaders could be cached in the pipeline library for future use.
For fragment shaders, we take inputs from the outputs of the pre-rasterization shaders to match inter-stage output/input locations. We could attempt to compile the fragment shader blindly at first, check for matching locations during final Metal pipeline linking, and if they don't match, reconvert and recompile the fragment shader. These additional compiled Metal shaders could be cached in the pipeline library for future use.

Except for constant specialization, there is a 1:1 correspondence requirement between converting SPIR-V to MSL and the final Metal compiled shader. This has two implications:

The above design would result in the pipeline libraries holding final compiled Metal shaders, meaning clean pipeline linking would not require further Metal shader compiling (but would require Metal pipeline compiling as mentioned above).
Inline forced recompiles during final pipeline linking will also require reconverting SPIR-V to MSL, in addition to recompiling the Metal shader.

@danginsburg @cdavis5e I would appreciate your commentary and discussion.

cdavis5e commented 1 year ago

This will require large amount of refactoring of MVKGraphicsPipeline. This will be a significant amount of work, and since MVKGraphicsPipeline is complicated, there is a significant potential for introducing subtle regression errors.

Agreed. I really hate touching that code anyway, so we need to get this out of the way. I guess I'm one to talk, since most of that is my fault to begin with ;).

Each rasterizing Metal graphics pipeline needs both pre-rasterization and fragment shaders. Because of this, final Metal pipelines will always need to be compiled when linked, and so graphicsPipelineLibraryFastLinking will return false.

Or, we could compile the shaders into dylibs; then we can support fast linking. DXVK is a heavy user of fast linking, so this is important to support.

danginsburg commented 1 year ago

In order for the implementation to be worthwhile, we'd need to be able to minimize the cases where you have to regenerate the MSL for the pre-rasterization and fragment shaders at link time.

For pre-rasterization shaders, Metal requires vertex buffer values to be declared with correct sign and vector size. We use vertex attributes from the vertex input stage to tell the shader the format to indicate both of these. We could instead check these during final Metal pipeline linking, and trigger an automatic shader reconversion and recompile if there is a mismatch. These additional compiled Metal shaders could be cached in the pipeline library for future use.

Can you be specific about exactly what needs to be known for MSL conversion? The sign and vector size is generally known from the high level language so I wonder if we could extend VK_EXT_graphics_pipeline_library such that pre-rasterization shaders included just that information (or just get it from SPIR-V reflection). For example, in HLSL we get a declaration of float/2/3/4, int/2/3/4 or uint/2/3/4. If that was all that was needed I think we actually could know that at pre-rasterization time (as opposed to for example format which we do not know).

For fragment shaders, we take inputs from the outputs of the pre-rasterization shaders to match inter-stage output/input locations. We could attempt to compile the fragment shader blindly at first, check for matching locations during final Metal pipeline linking, and if they don't match, reconvert and recompile the fragment shader. These additional compiled Metal shaders could be cached in the pipeline library for future use.

I am curious on more specifics here too. In order to use GPL you have to have matching input/outputs in your VS/FS (i.e. it's like separate shader object in OpenGL). So what exactly do you need to know here? I wonder again if this is something that could be passed as additional metadata or inferred from the SPIR-V.

billhollings commented 1 year ago

@cdavis5e

Or, we could compile the shaders into dylibs; then we can support fast linking. DXVK is a heavy user of fast linking, so this is important to support.

That might be an option. Metal dylibs can't be used for building an executable MTLFunction, and just define linkable functions that can be called from actual executable functions. One option might be to create non-entry-point functions that have the same call signature as the Metal entry-point functions (minus argument attributes I expect), and then create identical entry-point functions (with argument attributes) during pipeline building stage, that will simply forward the call to these implementation functions. It remains to be seen what compiling that final entry point function will cost in terms of time. Might be fast.

Another option might be to use pipeline binary archives, and create binary archives of partial pipelines that can be used to create a complete pipeline. One gotcha here is that a Metal pipeline can't have only a fragment shader, and we might not be able to build a binary archive from a Metal pipeline descriptor that only contains a fragment shader. We might have to stub in a dummy vertex function that outputs dummy stage inputs used by the fragment shader.

Thoughts?

Once we come up with a couple of reasonable options, I'm going to see if I can bounce these options off the Apple engineers.

billhollings commented 1 year ago

@danginsburg

Can you be specific about exactly what needs to be known for MSL conversion?

Here's the area in code that covers this. It tells the shader code if the vertex attribute is unsigned, so the shader can convert it to signed if that's what the shader expect. I guess Vulkan permits this kind of mismatch, but Metal does not. This will only affect apps that have been imprecise in the VA vs shader definitions. And perhaps we could investigate flipping this around, and changing the vertex attribute definition, after shader conversion, based on reflecting from the shader conversion.

https://github.com/KhronosGroup/MoltenVK/blob/77b3cc03f41d1f4bcf9e7de0a8dc0a26315be3e9/MoltenVK/MoltenVK/GPUObjects/MVKPipeline.mm#L1759-L1761

I am curious on more specifics here too. In order to use GPL you have to have matching input/outputs in your VS/FS.

(BTW...I'm not familiar with the acronym GPL. Can you give me some context?)

Here's the area in code that covers this. After having another look at this, I may have misspoke about the locations. It's not about tracking the location, but about matching input/output formats. Again, this is probably covering edge cases in SPIR-V shader combos that aren't playing nice. So it might not be a significant issue, and we can explore alternatives.

https://github.com/KhronosGroup/MoltenVK/blob/77b3cc03f41d1f4bcf9e7de0a8dc0a26315be3e9/MoltenVK/MoltenVK/GPUObjects/MVKPipeline.mm#L1841-L1842

danginsburg commented 1 year ago

I guess Vulkan permits this kind of mismatch, but Metal does not. This will only affect apps that have been imprecise in the VA vs shader definitions. And perhaps we could investigate flipping this around, and changing the vertex attribute definition, after shader conversion, based on reflecting from the shader conversion.

OK, so given it's just a corner case when signedness mismatches it seems like most of the time we would be able to hit the fast path of converting to MSL ahead of time and only have to re-generate it at draw time in rare cases. I could imagine also even patching the MSL in this case rather than doing a full regeneration if making this case fast was important.

(BTW...I'm not familiar with the acronym GPL. Can you give me some context?)

Sorry, I was using that as shorthand for Graphics Pipeline Libraries. I probably shouldn't since it would easily be confused with the other GPL :)

Here's the area in code that covers this. After having another look at this, I may have misspoke about the locations. It's not about tracking the location, but about matching input/output formats.

Yeah, formats and locations will always have to match for the shaders we use with VK_EXT_graphics_pipeline_library so this shouldn't be an issue for us. I'm not sure precisely what case you are handling there, I thought interface matching rules requires the same OpType across stages.

KhronosGroup / MoltenVK

Add support for VK_EXT_graphics_pipeline_library #1711