GarageGames / Torque3D

MIT Licensed Open Source version of Torque 3D from GarageGames
http://torque3d.org
MIT License
3.35k stars 1.2k forks source link

Eliminate if's in shaders #734

Open lukaspj opened 10 years ago

lukaspj commented 10 years ago

This isn't exactly an issue, just a reminder for myself, and others, to look at all the shaders in T3D and eliminate if's where possible.

A small test where I tried eliminating all if's from my new terrain blending shader gave a 0.5 MSPF reduction. (155FPS -> 165FPS)

E.g:

if( lerpBlend <= 0 ) 
{ 
   invBlend = 1-detailBlend0;
   ma = max(bumpNormal.a + detailBlend0, currentAlpha + invBlend) - blendDepth0;
   b1 = max(bumpNormal.a + detailBlend0 - ma, 0);
   b2 = max(currentAlpha + invBlend - ma, 0);
   currentAlpha = max(currentAlpha,bumpNormal.a);
}
IN.detCoord0.xy += parallaxOffset( normalMap0, IN.detCoord0.xy, negViewTS, detailIdStrengthParallax0.z * detailBlend0 );
if ( detailBlend0 > 0.0f )
{
   detailColor *= detailIdStrengthParallax0.y * IN.detCoord0.w;
   if( lerpBlend <= 0 ) 
      OUT.col.rgb = ((baseColor + detailColor).rgb * b1 + OUT.col.rgb * b2) / (b1 + b2);
   else
      OUT.col = lerp( OUT.col, baseColor + detailColor, detailBlend0 );
}
if( lerpBlend <= 0 ) 
   currentAlpha = (bumpNormal.a * b1 + currentAlpha * b2) / (b1 + b2);

Became:

invBlend = 1-detailBlend0;
ma = max(bumpNormal.a + detailBlend0, currentAlpha + invBlend) - blendDepth0;
b1 = max(bumpNormal.a + detailBlend0 - ma, 0);
b2 = max(currentAlpha + invBlend - ma, 0);
currentAlpha = max(currentAlpha,bumpNormal.a);
IN.detCoord0.xy += parallaxOffset( normalMap0, IN.detCoord0.xy, negViewTS, detailIdStrengthParallax0.z * detailBlend0 );
detailColor *= detailIdStrengthParallax0.y * IN.detCoord0.w;
OUT.col.rgb = ((baseColor + detailColor).rgb * b1 + OUT.col.rgb * b2) / (b1 + b2);
currentAlpha = (bumpNormal.a * b1 + currentAlpha * b2) / (b1 + b2);

This is because the GPU branches out on if-statements, and because of current GPU architecture, branching means it runs each branch for each pixel and discards the branches not used by the different pixels.

Meaning that if you have 1 if-statement, your code is run twice for each pixel.. Once for each possible outcome of the if-statement. It's best to try to avoid if's where possible to avoid this branching, even if it means doing some extra computations in the main body.

crabmusket commented 10 years ago

Great idea.

Azaezel commented 10 years ago

http://msdn.microsoft.com/en-us/library/windows/desktop/bb509610 additional note for fallbacks where folks just can't figure out an alternate. Will need to ask @LuisAntonRebollo if there is an equivalent problem and solution for OpenGL.

lukaspj commented 10 years ago

Okay, made some benchmarks with some better tools (the metrics(fps) is near useless in its current state) And here is my results for performance.

The benchmarking happened using a "camera bookmark" so it's recording the exact same place completely static. Numbers are the average FPS of 20 samples:

Blending Method [branch] default no ifs
Lerp 167 162 164
Heightmap 155 154 157

So the initial "0.5 mspf" gain was a little over-estimated (a result of the metrics(fps) stuff... Too damn unstable FPS if you don't have an average value) but it's still noticeable for such a small change.

([branch] method is the one described in @Azaezel 's link.

LuisAntonRebollo commented 10 years ago

For OpenGL3+ and DX10+ branching are a recomended method for improve performance.

DX9 has some problems, not sure if are a API limitation or a shader model 2/3 hardware restriction.

I prefer no remove if from code, we can add [branch] to help compiler.

@lukaspj, thx for benchmark :)