Pipeline cache serialization/deserialization investigation

expenses commented 3 years ago

Goal

It would be neat and useful to have an implementation of get_pipeline_cache_data on all modern platforms (Vulkan, DX12, Metal). Along with the corresponding code in create_pipeline_cache, this would allow for being able to cache the pipelines to disk on all backends, giving a good performance boost when a lot of pipelines are used.

Status

Vulkan

The Vulkan API has this get_pipeline_cache_data function built in.

Metal

Edit: disregard this whole section, see https://github.com/gfx-rs/gfx/issues/3716#issuecomment-813698647.

The Metal backend has a pipeline cache: https://github.com/gfx-rs/gfx/blob/2a93d52661aafcbd6441ea83e739c8ced906cd21/src/backend/metal/src/native.rs#L207-L211

However there is no way to serialize or deserialize it at present.

The key blocker for this is that the ModuleInfo struct stores a metal::Library:

https://github.com/gfx-rs/gfx/blob/2a93d52661aafcbd6441ea83e739c8ced906cd21/src/backend/metal/src/native.rs#L200-L205

While there is no way in the Metal API to serialize a MTLLibrary (the underlying type), there is a serialize function for MTLDynamicLibrary which I believe we could convert into. It serializes directly into a file though, which is pretty gross. Presumably we'd then have to read back from this file.

The other option would be to just store the metal source code for the shader that has been converted from spir-v. This would not give as big a performance improvement though.

MoltenVK

MoltenVK implements a pipeline cache with MVKPipelineCache. Similar to what we do with metal, this stores MVKShaderLibraryCaches which in turn store MVKShaderLibrarys. When implementing getPipelineCacheData, it writes the metal source code, similar to what I suggest as an option above.

As an example of this, here's some of the output of a pipeline cache I generated:

 &Y'v�˺GC�^\��@ mainzzzzzmain0>#include <metal_stdlib>
#include <simd/simd.h>

using namespace metal;

struct main0_out
{
    float4 uFragColor [[color(0)]];
};

struct main0_in
{
    float4 o_color [[user(locn0)]];
};

fragment main0_out main0(main0_in in [[stage_in]])
{
    main0_out out = {};
    out.uFragColor = in.o_color;
    return out;
}

TxC@�@mainzzzzzmain0�#include <metal_stdlib>
#include <simd/simd.h>
<...>

DX12

The DirectX 12 backend doesn't have a pipeline cache. However, there is an issue that lays out how one could be created: https://github.com/gfx-rs/gfx/issues/2877, similar to what the Metal backend does.

kvark commented 3 years ago

Thank you for filing this! About the Metal backend, the pipeline caching path is the old stuff we use with SPIRV-Cross. I tried to adjust it for Naga, but it wasn't easy. So for all the purposes, consider there not to be an implementation on Metal right now (since Naga is the future).

expenses commented 3 years ago

Okay, disregard basically everything that I wrote in the Metal section above, because on macOS 11.0 we can use the poorly-named poorly-documented MTLBinaryArchive which does pretty much what we want. We still have to do some writing to a file then reading back because it takes urls as parameters instead of raw bytes, but that's acceptable enough.

expenses commented 3 years ago

Okay, disregard basically everything that I wrote in the Metal section above, because on macOS 11.0 we can use the poorly-named poorly-documented MTLBinaryArchive which does pretty much what we want. We still have to do some writing to a file then reading back because it takes urls as parameters instead of raw bytes, but that's acceptable enough.

I've made a start on this at this branch: https://github.com/gfx-rs/gfx/compare/master...expenses:metal-pipeline-cache

hgallagher1993 commented 3 years ago

I actually did a small bit of research into the dx12 docs for #2877 last night since it seems to be free now, so I could keep looking into it and see if I get anywhere...I'm not too familiar with gfx-rs though and I've never done anything with dx12 so I wouldn't rely on me, but I will try 😄

expenses commented 3 years ago

Okay, I've been doing some testing of #3719 using a hacky fork of https://github.com/repi/shadertoy-browser. Basically it loads 8866 spir-v fragment shaders and creates a pipeline for each one using a basic vertex shader, then exits.

macOS has a system shader cache at $(getconf DARWIN_USER_CACHE_DIR)/com.apple.metal, so that needs to be taken into account when timing this.

Here are some timings with and without caches:

wiped system cache, no pipeline cache: 683.82s, 659.88s
hot system cache, no pipeline cache: 24.39s, 27.21s

wiped system cache, hot pipeline cache: 442.45s, 451.47s
hot system cache, hot pipeline cache: 25.56s, 26.97s, 28.54s

So it looks like using Binary Archives as a pipeline cache does have an improvement over no cache, but not nearly to the degree that you'd expect! It could be that the Binary Archive isn't set up correctly, but I've tested this with MTLPipelineOptionFailOnBinaryArchiveMiss and takes the same amount of time (450.93s) and successfully compiles all 8866 pipelines.

I'm going to look into a second cache to store SPIR-V -> MSL transformations to see how much that improves things.

gfx-rs / gfx