Speed up `world_draw_black_tile`

glebm commented 1 month ago

Previously, there were 2 loops to render all the lines of a diamond. This changes it to call RenderLeftTriangle + RenderRightTriangle. That's 4 loops + some extra clipping calculations Yet, this version is ~28% faster.

I don't understand why it's faster at all. Perhaps it's only faster on some hardware? Modern computers are magic.

Anyway, here are the (very surprising) results on my Ryzen 3950x:

BASELINE=master
BENCHMARK=dun_render_benchmark

git checkout "$BASELINE"
tools/build_and_run_benchmark.py -B "build-reld-${BASELINE}" --no-run "$BENCHMARK"
git checkout -

tools/build_and_run_benchmark.py --no-run "$BENCHMARK"
tools/linux_reduced_cpu_variance_run.sh ~/google-benchmark/tools/compare.py -a benchmarks \
  "build-reld-${BASELINE}/${BENCHMARK}" "build-reld/${BENCHMARK}" \
  --benchmark_filter='.*BlackTile.*' --benchmark_repetitions=10

Comparing build-reld-master/dun_render_benchmark to build-reld/dun_render_benchmark
Benchmark                                   Time             CPU      Time Old      Time New       CPU Old       CPU New
------------------------------------------------------------------------------------------------------------------------
BM_RenderBlackTile_pvalue                 0.0002          0.0002      U Test, Repetitions: 10 vs 10
BM_RenderBlackTile_mean                  -0.2786         -0.2786           156           113           156           113
BM_RenderBlackTile_median                -0.2808         -0.2808           157           113           157           113
BM_RenderBlackTile_stddev                -0.3148         -0.3138             1             1             1             1
BM_RenderBlackTile_cv                    -0.0503         -0.0488             0             0             0             0
OVERALL_GEOMEAN                          -0.2785         -0.2786             0             0             0             0

AJenbo commented 1 month ago

the line count is also nice :)

glebm commented 1 month ago

The clipping calculation for the right triangle might not be entirely correct, just got an ASAN error

AJenbo commented 1 month ago

let's hope that isn't where the speed comes from :D

diasurgical / devilutionX

Speed up `world_draw_black_tile` #7357