dun_render: Unroll triangle loops

Rather than relying on the compiler to do it, which doesn't always happen, we do it by hand.

Previously, very slightly different versions of the code could result in those loops not being unrolled (such as in the current master).

I've run the benchmark like this:

BASELINE=dun-benchmark
BENCHMARK=dun_render_benchmark
git checkout "$BASELINE"
tools/build_and_run_benchmark.py -B "build-reld-${BASELINE}" --no-run "$BENCHMARK"

git checkout -
tools/build_and_run_benchmark.py --no-run "$BENCHMARK"

tools/linux_reduced_cpu_variance_run.sh ~/google-benchmark/tools/compare.py -a benchmarks \
  "build-reld-${BASELINE}/${BENCHMARK}" "build-reld/${BENCHMARK}" \
  --benchmark_repetitions=10

Benchmark results from the first commit are here: https://gist.github.com/glebm/ea5378365128c4eabb25faa16be03926#file-benchmark-result-md

The FullyLit calls are ~55% faster. The PartiallyLit calls are ~40% faster.

The Solid_FullyDark version was initially twice as slow, which is surprising. A subsequent commit adds specialized RenderTriangleUpper and RenderTriangleLower for that combination.

Benchmark results for commit 2: https://gist.github.com/glebm/768bdcd8050029dbf140de477e02cb65

Only the means:

Benchmark	Time	CPU	Time Old	Time New	CPU Old	CPU New
LeftTriangle, Solid, FullyLit	-0.6149	-0.6149	19647	7566	19645	7565
LeftTriangle, Solid, FullyDark	+0.0758	+0.0758	20828	22407	20826	22404
LeftTriangle, Solid, PartiallyLit	-0.3864	-0.3864	102968	63176	102953	63168
LeftTriangle, Transparent, FullyLit	-0.0967	-0.0967	103958	93902	103944	93890
LeftTriangle, Transparent, FullyDark	-0.3825	-0.3825	104804	64718	104792	64711
LeftTriangle, Transparent, PartiallyLit	+0.0067	+0.0067	106556	107265	106544	107254
RightTriangle, Solid, FullyLit	-0.5890	-0.5890	18533	7616	18531	7616
RightTriangle, Solid, FullyDark	-0.0326	-0.0326	22899	22151	22896	22149
RightTriangle, Solid, PartiallyLit	-0.4104	-0.4104	107393	63315	107379	63308
RightTriangle, Transparent, FullyLit	-0.1203	-0.1203	109148	96018	109133	96005
RightTriangle, Transparent, FullyDark	-0.3252	-0.3252	108010	72881	107998	72872
RightTriangle, Transparent, PartiallyLit	-0.0189	-0.0189	111527	109421	111512	109405
TransparentSquare, Solid, FullyLit	-0.0002	-0.0002	175262	175222	175239	175199
TransparentSquare, Solid, FullyDark	-0.0198	-0.0199	167571	164247	167551	164224
TransparentSquare, Solid, PartiallyLit	-0.3265	-0.3266	272130	183271	272091	183235
TransparentSquare, Transparent, FullyLit	-0.1282	-0.1282	254365	221761	254332	221730
TransparentSquare, Transparent, FullyDark	-0.2193	-0.2193	252095	196821	252064	196795
TransparentSquare, Transparent, PartiallyLit	-0.0678	-0.0678	258382	240858	258352	240832
Square, Solid, FullyLit	-0.1021	-0.1021	9941	8926	9940	8925
Square, Solid, FullyDark	-0.0401	-0.0401	7090	6806	7089	6805
Square, Solid, PartiallyLit	-0.3984	-0.3984	210560	126676	210534	126659
Square, Transparent, FullyLit	-0.0605	-0.0605	208520	195902	208488	195875
Square, Transparent, FullyDark	-0.4413	-0.4413	208168	116312	208143	116298
Square, Transparent, PartiallyLit	-0.0270	-0.0270	231066	224829	231034	224796
LeftTrapezoid, Solid, FullyLit	-0.5303	-0.5303	5583	2622	5582	2622
LeftTrapezoid, Solid, FullyDark	-0.2270	-0.2270	5304	4100	5304	4100
LeftTrapezoid, Solid, PartiallyLit	-0.4018	-0.4018	53744	32152	53738	32148
LeftTrapezoid, Transparent, FullyLit	-0.0796	-0.0796	53993	49694	53987	49687
LeftTrapezoid, Transparent, FullyDark	-0.4080	-0.4080	53682	31782	53675	31778
LeftTrapezoid, Transparent, PartiallyLit	-0.0140	-0.0140	57240	56440	57234	56431
RightTrapezoid, Solid, FullyLit	-0.4681	-0.4680	4939	2627	4938	2627
RightTrapezoid, Solid, FullyDark	-0.0276	-0.0276	4267	4149	4266	4148
RightTrapezoid, Solid, PartiallyLit	-0.3792	-0.3792	52004	32282	51998	32278
RightTrapezoid, Transparent, FullyLit	-0.0621	-0.0621	52479	49218	52472	49212
RightTrapezoid, Transparent, FullyDark	-0.4268	-0.4268	52039	29826	52032	29822
RightTrapezoid, Transparent, PartiallyLit	-0.0132	-0.0132	55693	54959	55686	54953
OVERALL_GEOMEAN	-0.2437	-0.2437	0	0	0	0

diasurgical / devilutionX

dun_render: Unroll triangle loops #7354