The FullyLit calls are ~55% faster.
The PartiallyLit calls are ~40% faster.
The Solid_FullyDark version was initially twice as slow, which is surprising. A subsequent commit adds specialized RenderTriangleUpper and RenderTriangleLower for that combination.
Rather than relying on the compiler to do it, which doesn't always happen, we do it by hand.
Previously, very slightly different versions of the code could result in those loops not being unrolled (such as in the current master).
I've run the benchmark like this:
Benchmark results from the first commit are here: https://gist.github.com/glebm/ea5378365128c4eabb25faa16be03926#file-benchmark-result-md
The
FullyLit
calls are ~55% faster. ThePartiallyLit
calls are ~40% faster.The
Solid_FullyDark
version was initially twice as slow, which is surprising. A subsequent commit adds specializedRenderTriangleUpper
andRenderTriangleLower
for that combination.Benchmark results for commit 2: https://gist.github.com/glebm/768bdcd8050029dbf140de477e02cb65
Only the means: