Closed alexheretic closed 3 years ago
Using magnitude glyph_count * tallest_h * tallest_h
should scale a bit more realistically. This means case bench_1500_chars_12px
to use single-thread paths, where it was only 24% slower than mt (I'm looking at roughly 70% slower being the point to select mt).
``` group mt st ----- -- -- bench_1500_chars_150px 1.00 17.0±0.24ms 2.62 44.5±0.23ms bench_1500_chars_75px 1.00 6.6±0.07ms 2.62 17.2±0.04ms bench_1500_chars_30px 1.00 3.1±0.09ms 1.97 6.2±0.04ms bench_1500_chars_12px 1.02 3.0±0.00ms 1.00 2.9±0.02ms bench_300_chars_150px 1.00 5.1±0.06ms 2.72 14.0±0.18ms bench_300_chars_75px 1.00 1818.4±26.07µs 2.45 4.5±0.01ms bench_300_chars_30px 1.00 873.6±46.22µs 1.71 1497.0±16.47µs bench_300_chars_12px 1.00 765.3±2.30µs 1.00 766.0±5.30µs bench_50_chars_150px 1.00 1098.7±47.05µs 2.31 2.5±0.00ms bench_50_chars_75px 1.02 801.3±5.33µs 1.00 788.0±6.46µs bench_50_chars_30px 1.01 255.4±0.85µs 1.00 252.2±1.05µs bench_50_chars_12px 1.00 124.2±0.53µs 1.04 128.9±0.74µs bench_16_chars_150px 1.00 770.4±4.48µs 1.00 768.6±7.30µs bench_16_chars_75px 1.00 238.7±0.71µs 1.00 239.6±0.92µs bench_16_chars_30px 1.00 76.7±0.19µs 1.01 77.7±0.31µs bench_16_chars_12px 1.00 38.4±0.19µs 1.06 40.9±0.17µs ```
This pr aims to eliminate cases where default (multithread=true) performs worse than setting multithread=off. Instead we'll only use multithread code paths if it looks like they'll be significant gains.
st_vs_mt
benchmark code.Investigation
st_vs_mt
benchmark runs through a set of scenarios of 16-1500 unicode glyphs at scales 12px-150px.There are two areas that can use multithreading, outlining & rasterizing. An early testing result of just mt-outlining against singlethreaded (st) showed the mt outlining isn't providing much value.
``` group mt-outline-only st ----- --------------- -- bench_1500_chars_150px 1.02 45.5±0.17ms 1.00 44.5±0.23ms bench_1500_chars_75px 1.00 17.1±0.17ms 1.00 17.2±0.04ms bench_1500_chars_30px 1.00 5.8±0.03ms 1.08 6.2±0.04ms bench_1500_chars_12px 1.00 2.6±0.02ms 1.15 2.9±0.02ms bench_300_chars_150px 1.03 14.4±0.14ms 1.00 14.0±0.18ms bench_300_chars_75px 1.04 4.6±0.05ms 1.00 4.5±0.01ms bench_300_chars_30px 1.00 1494.0±27.63µs 1.00 1497.0±16.47µs bench_300_chars_12px 1.00 727.8±11.16µs 1.05 766.0±5.30µs bench_50_chars_150px 1.06 2.7±0.03ms 1.00 2.5±0.00ms bench_50_chars_75px 1.15 903.1±20.58µs 1.00 788.0±6.46µs bench_50_chars_30px 1.27 321.0±6.58µs 1.00 252.2±1.05µs bench_50_chars_12px 1.42 182.5±2.24µs 1.00 128.9±0.74µs bench_16_chars_150px 1.16 895.2±19.25µs 1.00 768.6±7.30µs bench_16_chars_75px 1.35 322.9±4.27µs 1.00 239.6±0.92µs bench_16_chars_30px 1.82 141.0±2.72µs 1.00 77.7±0.31µs bench_16_chars_12px 2.17 88.6±2.55µs 1.00 40.9±0.17µs ```
However, mt drawing is worth it, at least in some cases.
mt code can be faster. But it can also be slower for small workloads. It's also worth noting that when the performance is the similar we can assume the single thread version will be more power efficient.
Using the tallest glyph height & multiplying by the number of glyphs I calculated a "work magnitude". This can be used to target cases where we expect a decent speedup. I plucked out min magnitude
8742
:Now only
bench_300_chars_30px
and larger magnitude work is using multithreading. While not an exact science the work estimate is good enough to ensure mt benches are never slower than st. So this should resolve #125.