Previously, there were 2 loops to render all the lines of a diamond. This changes it to call RenderLeftTriangle + RenderRightTriangle. That's 4 loops + some extra clipping calculations
Yet, this version is ~28% faster.
I don't understand why it's faster at all.
Perhaps it's only faster on some hardware?
Modern computers are magic.
Anyway, here are the (very surprising) results on my Ryzen 3950x:
Comparing build-reld-master/dun_render_benchmark to build-reld/dun_render_benchmark
Benchmark Time CPU Time Old Time New CPU Old CPU New
------------------------------------------------------------------------------------------------------------------------
BM_RenderBlackTile_pvalue 0.0002 0.0002 U Test, Repetitions: 10 vs 10
BM_RenderBlackTile_mean -0.2786 -0.2786 156 113 156 113
BM_RenderBlackTile_median -0.2808 -0.2808 157 113 157 113
BM_RenderBlackTile_stddev -0.3148 -0.3138 1 1 1 1
BM_RenderBlackTile_cv -0.0503 -0.0488 0 0 0 0
OVERALL_GEOMEAN -0.2785 -0.2786 0 0 0 0
Previously, there were 2 loops to render all the lines of a diamond. This changes it to call
RenderLeftTriangle
+RenderRightTriangle
. That's 4 loops + some extra clipping calculations Yet, this version is ~28% faster.I don't understand why it's faster at all. Perhaps it's only faster on some hardware? Modern computers are magic.
Anyway, here are the (very surprising) results on my Ryzen 3950x: