TheRealMJP / TheRealMJP.github.io

Backing repo for my blog
16 stars 1 forks source link

The Shader Permutation Problem - Part 2: How Do We Fix It? #10

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

The Shader Permutation Problem - Part 2: How Do We Fix It?

https://therealmjp.github.io/posts/shader-permutations-part2/

psy-fidelious commented 2 years ago

In OpenGL 4.0 land there are also uniform subroutines https://www.khronos.org/opengl/wiki/Shader_Subroutine Those were cut from Vulkan though, which makes me suspect they were not implemented very portably across vendors.

alecazam commented 2 years ago

It's also challenge just to write shaders that aren't in MSL. The spectrum of Vulkan hw has an issue with explicit float16_t - with two extensions 16bit_storage and float16_int8 and an inability to compile GLSL without casting all constants to half once used. Then 16bit_storage has 4 booleans on it for pushConstants16, inputOutput16, and 2 others. Just writing the shaders in the first place to use float16_t and where you can or can't use that is a challenge to say the least. Nvidia/Adreno don't have inputOutput16, and newer Mali/AMD/Intel/Apple have all the storage flags.

GeneralGDA commented 2 years ago

Nice article. Thanks for the links to the Doom presentations.

omd24 commented 2 years ago

Thanks for the great article. Can you a little bit further explain the deferred rendering example? in the Part1 article you mentioned the sample pixel shader has 8 permutations but now here it's said 12 permutations. Am I missing smth?

TheRealMJP commented 2 years ago

@omd24 thank you! I apologize about that discrepancy in the shader permutation counts, this was a victim of some last-minute revisions. I originally used a more complex example for a forward renderer in part 1, and then after writing part 2 I went back and simplified the example. I'll update the text to reflect this, but you can consider the example in part 2 to be similar but not identical to the forward rendering setup in part 1.

The important point I'm trying to make there is that splitting a single big shader into multiple steps can reduce the total number of shader permutations. Deferred rendering is a good example of achieving that in the context of standard rendering/shading pipeline, but there are other cases where it could apply. For example, it's possible to use a compute shader to perform complex per-vertex operations like skinning or morph targets. This can reduce total shader permutation count by pulling those operations of the vertex shader, but it also requires writing the vertex data out to memory and reading it in again (which is a very similar trade-off to deferred rendering).

tuket commented 10 months ago

Hello, thanks for the article. There is another simple technique: using "identity default values". For example, you may have some materials that use normal textures and others that don't. Instead of making a permutation, you could bind a one-pixel texture with value vec3(0, 0, 1). The problem with this approach is that you are doing an unnecessary fetch for objects that don't use a normal texture (and some extra computations as well). But it could be worth considering if 95% of your objects use normal mapping anyways. What are your thoughts on this? Is it something you would do?

TheRealMJP commented 10 months ago

Hey @tuklet, yeah I agree that's probably worth considering if it's really not preferable to add a new uniform to the shader for branching over the texture fetch. You would avoid the small cost of the comparison and branch, but also end up taking the cost of sampling the 1x1 texture (along with any other logic/math that could potentially be branched over if the feature is disabled). In general I prefer the branch on the uniform but there could definitely be valid reasons to use an empty texture instead.

Nicholas-Steel commented 1 month ago

For gamers, the solution is simple, give Player's an option with these choices:

Nicholas-Steel commented 1 month ago