Use a content-addressed storage for shader module binaries

ishitatsuyuki commented 1 year ago

Rationale: Vertex shaders often gets permuted with multiple fragment shaders, i.e. a shader is used in multiple pipelines. If the app creates a shader VkShaderModule, then it's all good. However, sometimes that is not the case: for example, there's no VkShaderModule equivalent in DX12 (shader bytecode is passed directly in pipeline creation), so vkd3d, which translates from DX12, will create VkShaderModule once per stage per pipeline. This means that a lot of duplicate module binaries are recorded.

An alternative scenario is when the game would like to save memory by destroying VkShaderModule and creating only once the shader gets used in a new permutation. This would similarly lead to duplicates.

On pathological cases this can amount to lots of memory and disk usage. The game I'm capturing right now is very shader and permutation heavy, and out of 76486 files (2.6GB) from gfxrecon-extract, 63544 (1.9GB) are duplicates.

My suggestion:

Hash the SPIR-V bytecode and deduplicate them in memory.
- Sidetrack: It would be good to precompress the bytecode too as 50% compression ratios are trivially possible (due to SPIR-V's high redundancy).
When serializing, make the shader module creation structs refer to SPIR-V bytecode by their handle (hash) and store the actual bytecode as a different chunk type. This way, the vkCreateShaderModule calls are still duplicated, but the bytecode they refer to will be not.

andrew-lunarg commented 1 year ago

It is probably worth also raising an issue at the vkd3d repo for them to do this de-duplication from their side too. One function of gfxr is to reveal such inefficiencies in the Vulkan call stream from higher-level components.

ishitatsuyuki commented 1 year ago

Doing deduplication on vkd3d side implies keeping around the shader module for an indefinite period, which is not really feasible. I don't see a better option for translating from DX12, where we currently create a shader module, build the pipeline, then destroy the shader module immediately.

Additionally, I wouldn't really call this an "inefficiency" in normal usage of Vulkan. Creating redundant shader modules usually just ends up with redundant hashing on the driver side which is not free but mostly acceptable. It's definitely an acceptable tradeoff for translating from a foreign model like DX12. On the other hand gfxr is significantly amplifying the impact of this by keeping all of them around in memory and on disk.

andrew-lunarg commented 1 year ago

I do like the idea. It seems like a big win for capture file size. I should note a couple of tradeoffs I see.

Capture would have to track the hashes it has seen for some memory impact that grows over the course of a capture, and would have the added CPU costs of doing the hashing while trying to keep up with the application it is capturing.
- Perhaps a tool like gfxr-compress could perform this optimisation as a post-process.
Replay would have to keep every hashed resource it has ever seen either in CPU memory or in some on-disk cache in case a later vk call refers to it by hash.

ishitatsuyuki commented 1 year ago

Capture would have to track the hashes it has seen for some memory impact

Hashes should be much smaller the actual shader module so I would consider that negligible (and much worth it considering the net gain).

Other points, I agree.

LunarG / gfxreconstruct

Use a content-addressed storage for shader module binaries #907