diasurgical / devilutionX

Diablo build for modern operating systems
Other
8.01k stars 786 forks source link

dun_render: Unroll triangle loops #7354

Closed glebm closed 1 month ago

glebm commented 1 month ago

Rather than relying on the compiler to do it, which doesn't always happen, we do it by hand.

Previously, very slightly different versions of the code could result in those loops not being unrolled (such as in the current master).

I've run the benchmark like this:

BASELINE=dun-benchmark
BENCHMARK=dun_render_benchmark
git checkout "$BASELINE"
tools/build_and_run_benchmark.py -B "build-reld-${BASELINE}" --no-run "$BENCHMARK"

git checkout -
tools/build_and_run_benchmark.py --no-run "$BENCHMARK"

tools/linux_reduced_cpu_variance_run.sh ~/google-benchmark/tools/compare.py -a benchmarks \
  "build-reld-${BASELINE}/${BENCHMARK}" "build-reld/${BENCHMARK}" \
  --benchmark_repetitions=10

Benchmark results from the first commit are here: https://gist.github.com/glebm/ea5378365128c4eabb25faa16be03926#file-benchmark-result-md

The FullyLit calls are ~55% faster. The PartiallyLit calls are ~40% faster.

The Solid_FullyDark version was initially twice as slow, which is surprising. A subsequent commit adds specialized RenderTriangleUpper and RenderTriangleLower for that combination.

Benchmark results for commit 2: https://gist.github.com/glebm/768bdcd8050029dbf140de477e02cb65

Only the means:

Benchmark Time CPU Time Old Time New CPU Old CPU New
LeftTriangle, Solid, FullyLit -0.6149 -0.6149 19647 7566 19645 7565
LeftTriangle, Solid, FullyDark +0.0758 +0.0758 20828 22407 20826 22404
LeftTriangle, Solid, PartiallyLit -0.3864 -0.3864 102968 63176 102953 63168
LeftTriangle, Transparent, FullyLit -0.0967 -0.0967 103958 93902 103944 93890
LeftTriangle, Transparent, FullyDark -0.3825 -0.3825 104804 64718 104792 64711
LeftTriangle, Transparent, PartiallyLit +0.0067 +0.0067 106556 107265 106544 107254
RightTriangle, Solid, FullyLit -0.5890 -0.5890 18533 7616 18531 7616
RightTriangle, Solid, FullyDark -0.0326 -0.0326 22899 22151 22896 22149
RightTriangle, Solid, PartiallyLit -0.4104 -0.4104 107393 63315 107379 63308
RightTriangle, Transparent, FullyLit -0.1203 -0.1203 109148 96018 109133 96005
RightTriangle, Transparent, FullyDark -0.3252 -0.3252 108010 72881 107998 72872
RightTriangle, Transparent, PartiallyLit -0.0189 -0.0189 111527 109421 111512 109405
TransparentSquare, Solid, FullyLit -0.0002 -0.0002 175262 175222 175239 175199
TransparentSquare, Solid, FullyDark -0.0198 -0.0199 167571 164247 167551 164224
TransparentSquare, Solid, PartiallyLit -0.3265 -0.3266 272130 183271 272091 183235
TransparentSquare, Transparent, FullyLit -0.1282 -0.1282 254365 221761 254332 221730
TransparentSquare, Transparent, FullyDark -0.2193 -0.2193 252095 196821 252064 196795
TransparentSquare, Transparent, PartiallyLit -0.0678 -0.0678 258382 240858 258352 240832
Square, Solid, FullyLit -0.1021 -0.1021 9941 8926 9940 8925
Square, Solid, FullyDark -0.0401 -0.0401 7090 6806 7089 6805
Square, Solid, PartiallyLit -0.3984 -0.3984 210560 126676 210534 126659
Square, Transparent, FullyLit -0.0605 -0.0605 208520 195902 208488 195875
Square, Transparent, FullyDark -0.4413 -0.4413 208168 116312 208143 116298
Square, Transparent, PartiallyLit -0.0270 -0.0270 231066 224829 231034 224796
LeftTrapezoid, Solid, FullyLit -0.5303 -0.5303 5583 2622 5582 2622
LeftTrapezoid, Solid, FullyDark -0.2270 -0.2270 5304 4100 5304 4100
LeftTrapezoid, Solid, PartiallyLit -0.4018 -0.4018 53744 32152 53738 32148
LeftTrapezoid, Transparent, FullyLit -0.0796 -0.0796 53993 49694 53987 49687
LeftTrapezoid, Transparent, FullyDark -0.4080 -0.4080 53682 31782 53675 31778
LeftTrapezoid, Transparent, PartiallyLit -0.0140 -0.0140 57240 56440 57234 56431
RightTrapezoid, Solid, FullyLit -0.4681 -0.4680 4939 2627 4938 2627
RightTrapezoid, Solid, FullyDark -0.0276 -0.0276 4267 4149 4266 4148
RightTrapezoid, Solid, PartiallyLit -0.3792 -0.3792 52004 32282 51998 32278
RightTrapezoid, Transparent, FullyLit -0.0621 -0.0621 52479 49218 52472 49212
RightTrapezoid, Transparent, FullyDark -0.4268 -0.4268 52039 29826 52032 29822
RightTrapezoid, Transparent, PartiallyLit -0.0132 -0.0132 55693 54959 55686 54953
OVERALL_GEOMEAN -0.2437 -0.2437 0 0 0 0