Add more comprehensive benchmarking

LaurenzV commented 2 days ago

I tried to basically include one set of tests for each script that uses some different part of the shaping engine. The results can be seen below. All of the texts are a translated version of the English part using Google Translate.

Some observations: As expected we are always slower than harfbuzz. I would say on average, we are a bit less than 2x slower, but it does depend a lot on the script and on the input size. In general, for smaller inputs the slowdown is less noticeable in most cases, but for many larger inputs we are much slower, in many cases reaching 2x-3x slowdown. Maybe it's because larger inputs tend to exercise the caching mechanisms more, which harfbuzz has a lot of? But another possbility is that we haven't optimized vector allocations a lot.

Arabic seems to have the best performance overall, but even then, the larger the input the larger the gap. Even for English, performance gets much worse for larger inputs. Hebrew seems to be pretty bad in all cases. The other ones follow more or less a similar pattern, it's always around the 2x range.

test ar_multiple_paragraphs::hb ... bench:   3,474,008.40 ns/iter (+/- 129,998.72)
test ar_multiple_paragraphs::rb ... bench:   4,216,466.65 ns/iter (+/- 395,178.75)
test ar_paragraph::hb           ... bench:     462,324.90 ns/iter (+/- 16,476.22)
test ar_paragraph::rb           ... bench:     551,525.51 ns/iter (+/- 23,411.34)
test ar_sentence::hb            ... bench:     153,056.93 ns/iter (+/- 14,194.18)
test ar_sentence::rb            ... bench:     197,347.40 ns/iter (+/- 5,548.81)
test ar_word::hb                ... bench:      36,971.53 ns/iter (+/- 2,301.39)
test ar_word::rb                ... bench:      75,577.08 ns/iter (+/- 1,983.71)
test en_multiple_paragraphs::hb ... bench:     865,150.00 ns/iter (+/- 56,078.37)
test en_multiple_paragraphs::rb ... bench:   1,978,262.50 ns/iter (+/- 66,689.62)
test en_paragraph::hb           ... bench:     102,557.32 ns/iter (+/- 3,741.25)
test en_paragraph::rb           ... bench:     224,621.88 ns/iter (+/- 7,488.14)
test en_sentence::hb            ... bench:      50,378.91 ns/iter (+/- 3,013.86)
test en_sentence::rb            ... bench:      98,945.83 ns/iter (+/- 6,985.78)
test en_word::hb                ... bench:      24,207.49 ns/iter (+/- 1,139.02)
test en_word::rb                ... bench:      30,576.56 ns/iter (+/- 672.63)
test en_zalgo::hb               ... bench:      58,160.18 ns/iter (+/- 2,062.32)
test en_zalgo::rb               ... bench:      79,440.46 ns/iter (+/- 2,276.43)
test he_multiple_paragraphs::hb ... bench:     332,387.45 ns/iter (+/- 24,845.47)
test he_multiple_paragraphs::rb ... bench:     847,552.10 ns/iter (+/- 29,306.05)
test he_paragraph::hb           ... bench:      34,999.71 ns/iter (+/- 2,586.70)
test he_paragraph::rb           ... bench:      88,645.60 ns/iter (+/- 3,880.03)
test he_sentence::hb            ... bench:      12,965.69 ns/iter (+/- 972.89)
test he_sentence::rb            ... bench:      31,622.83 ns/iter (+/- 1,493.27)
test he_word::hb                ... bench:       6,761.86 ns/iter (+/- 153.72)
test he_word::rb                ... bench:       9,155.00 ns/iter (+/- 552.64)
test hi_multiple_paragraphs::hb ... bench:   2,515,320.83 ns/iter (+/- 76,306.12)
test hi_multiple_paragraphs::rb ... bench:   3,581,679.20 ns/iter (+/- 108,367.91)
test hi_paragraph::hb           ... bench:     302,481.25 ns/iter (+/- 34,049.30)
test hi_paragraph::rb           ... bench:     440,447.93 ns/iter (+/- 19,947.63)
test hi_sentence::hb            ... bench:     103,142.02 ns/iter (+/- 3,940.38)
test hi_sentence::rb            ... bench:     161,706.95 ns/iter (+/- 16,675.00)
test hi_word::hb                ... bench:      37,638.99 ns/iter (+/- 1,154.74)
test hi_word::rb                ... bench:      64,444.26 ns/iter (+/- 2,085.07)
test kh_multiple_paragraphs::hb ... bench:   2,068,112.50 ns/iter (+/- 70,055.78)
test kh_multiple_paragraphs::rb ... bench:   3,449,987.50 ns/iter (+/- 131,344.16)
test kh_paragraph::hb           ... bench:     222,960.45 ns/iter (+/- 18,977.76)
test kh_paragraph::rb           ... bench:     323,104.17 ns/iter (+/- 8,862.81)
test kh_sentence::hb            ... bench:      77,557.50 ns/iter (+/- 7,607.51)
test kh_sentence::rb            ... bench:     104,153.65 ns/iter (+/- 4,732.65)
test kh_word::hb                ... bench:      20,298.44 ns/iter (+/- 1,358.95)
test kh_word::rb                ... bench:      25,753.37 ns/iter (+/- 897.68)
test my_multiple_paragraphs::hb ... bench:   4,843,425.10 ns/iter (+/- 146,892.59)
test my_multiple_paragraphs::rb ... bench:   8,070,812.50 ns/iter (+/- 207,726.60)
test my_paragraph::hb           ... bench:     576,554.20 ns/iter (+/- 15,705.40)
test my_paragraph::rb           ... bench:     927,945.80 ns/iter (+/- 30,213.78)
test my_sentence::hb            ... bench:     162,603.12 ns/iter (+/- 4,125.83)
test my_sentence::rb            ... bench:     255,895.85 ns/iter (+/- 6,534.14)
test my_word::hb                ... bench:      26,158.51 ns/iter (+/- 1,262.81)
test my_word::rb                ... bench:      26,531.25 ns/iter (+/- 13,193.23)
test th_multiple_paragraphs::hb ... bench:     530,308.40 ns/iter (+/- 37,614.10)
test th_multiple_paragraphs::rb ... bench:   1,073,035.43 ns/iter (+/- 228,141.92)
test th_paragraph::hb           ... bench:      50,543.06 ns/iter (+/- 3,546.82)
test th_paragraph::rb           ... bench:     106,668.75 ns/iter (+/- 9,408.54)
test th_sentence::hb            ... bench:      18,558.85 ns/iter (+/- 353.77)
test th_sentence::rb            ... bench:      38,865.97 ns/iter (+/- 2,349.58)
test th_word::hb                ... bench:       6,958.80 ns/iter (+/- 96.84)
test th_word::rb                ... bench:      10,535.42 ns/iter (+/- 214.26)

Suggestions on what other things to add are welcome. I guess AAT would be nice to have. And we probably should also add specific features to target specific parts of the code (for example to test kerning, etc.), but I think this is a bit overkill for now. I think this is a good start.

behdad commented 2 days ago

Given that RustyBuzz doesn't have a shape-plan cache, I'm really surprised that you see it faster on short text.

Would be nice to see how adding hb_set_digest_t speeds up RB.

LaurenzV commented 2 days ago

Given that RustyBuzz doesn't have a shape-plan cache, I'm really surprised that you see it faster on short text.

We do have shape plan caching, but we're not using it here (I think?). But by "faster" I mean faster in relative terms, in absolute terms we are always slower than harfbuzz. :D I'll update the wording.

Would be nice to see how adding hb_set_digest_t speeds up RB.

Indeed, hopefully this can help with evaluating how much the different optimizations help!

LaurenzV commented 2 days ago

I'm not sure about the shape plan caching thing, but I believe there was a PR of someone adding it (that's why I removed it as a missing feature from the README), maybe @RazrFalcon can clarify.

RazrFalcon commented 2 days ago

Please use full language/script names in benchmarks. The current one are unreadable.

For text samples I suggest using Wikipedia articles about the language. It would be more realistic I think. This is what I did originally.

Also, the text is too long afaik. I do not know how people usually use shapers, but in my mind this should be done on word/sentence/paragraph basis. Meaning a benchmark with more then 100 words is probably an overkill. So we probably can remove multiple_paragraphs.txt. Thoughts?

Also, try looking at wiki pages for the language to find the most absurd lines. The one with the most diacritics and language-specific weirdness. There is no point in benchmarking "plain" text.

How much space all included fonts take? Have you subsetted them as well? They feel small.

We most definitely should have macos-only tests. Just make sure the font is actually AAT and not a regular OpenType.

We should also test variable fonts.

An English monospace would be a good test as well.

Yes, we do have shape plan caching since recently, but not automatic like in HB. A caller must cache it on their side. Not sure what to do in benchmarks thought. On one side caching "pollutes" results, since the first run is always slower. But on the other side this is the default behavior in HB... Let's test uncached for now.

Maybe it's because larger inputs tend to exercise the caching mechanisms more, which harfbuzz has a lot of?

Cache plan is not affected by the input text. It simply caches font properties. So no.

But another possbility is that we haven't optimized vector allocations a lot.

We have the same number of allocations as HB. Maybe even less. This is by design.

I would say that in term of performance optimizations we should simply run rustybuzz under profiler and see the hotspots. There is no much point in comparing it to HB here. Especially since we have a completely different parser. My bet is that we re-parse GSUB/GPOS a lot.

There is also a chance that ragel output for Rust isn't as fast as for C.

Either way thanks for your work again. I never had time to do proper benchmarking of rb. My only goal was correctness/completeness. I'm sure there are a lot of low hanging fruits in terms of optimization.

Overall, the current results are way better then I was expecting.

LaurenzV commented 2 days ago

Please use full language/script names in benchmarks. The current one are unreadable.

Will do.

For text samples I suggest using Wikipedia articles about the language. It would be more realistic I think. This is what I did originally.

Will look into it.

Also, the text is too long afaik. I do not know how people usually use shapers, but in my mind this should be done on word/sentence/paragraph basis. Meaning a benchmark with more then 100 words is probably an overkill. So we probably can remove multiple_paragraphs.txt. Thoughts?

Yeah I actually had the same thought. I guess it makes sense to limit ourselves to one paragraph at most, and maybe add some shorter as well as longer paragraphs.

Also, try looking at wiki pages for the language to find the most absurd lines. The one with the most diacritics and language-specific weirdness. There is no point in benchmarking "plain" text.

This will be hard to do for any non-Latin text since I can't read the scripts. 😅 But I can try finding some for the English text, and maybe include some longer zalgo text.

How much space all included fonts take? Have you subsetted them as well? They feel small.

Around 200KB, not subsetted. But this might increase a bit if we also include variable fonts. The problem with subsetting is that we would have to regenerate them every time we add a new benchmark, which is somewhat annoying. :/

We should also test variable fonts.

Yeah, I'll look into it.

An English monospace would be a good test as well.

Does using mono make any difference to just a normal English font?

I would say that in term of performance optimizations we should simply run rustybuzz under profiler and see the hotspots. There is no much point in comparing it to HB here. Especially since we have a completely different parser. My bet is that we re-parse GSUB/GPOS a lot.

Any ideas how to best profile? The problem with tools like VTune (which I don't have anyway) is that they probably don't work well for programs that finish in a few milliseconds, afaik.

RazrFalcon commented 2 days ago

This will be hard to do for any non-Latin text since I can't read the scripts.

Me neither. I just google Arabic Wiki -> select Language -> Arabic -> select first sentence. Since it's a wiki there shouldn't be anything offensive in the first few lines, I hope 👀

Around 200KB, not subsetted.

Good. I would explicitly avoid subsetting, since it would make them too sanitized.

Does using mono make any difference to just a normal English font?

Sort of. For one, the advance for each glyph is the same. So the shaper has to do less work. Which is what we're testing. And monospaced font's are in general pretty simple.

Any ideas how to best profile?

On macOS I use Instruments just fine. Simply compile the shape example in release mode and run it via Instruments -> CPU profiler.

Yes, on tiny inputs the output would be meh, but on larger one it should be fine.

LaurenzV commented 1 day ago

Bad news, I just realized that harfbuzz_rs is running on an old version. I changed it locally to use the newest, and looks like harfbuzz has gotten quite a speed boost since then, so looks we are even worse off now, sometimes even being 3x slower or more. :(

test ar_multiple_paragraphs::hb ... bench:   2,633,187.53 ns/iter (+/- 154,465.21)
test ar_multiple_paragraphs::rb ... bench:   4,102,137.50 ns/iter (+/- 103,185.38)
test ar_paragraph::hb           ... bench:     339,277.10 ns/iter (+/- 6,176.14)
test ar_paragraph::rb           ... bench:     543,724.95 ns/iter (+/- 43,343.52)
test ar_sentence::hb            ... bench:     105,098.95 ns/iter (+/- 8,192.71)
test ar_sentence::rb            ... bench:     195,847.90 ns/iter (+/- 6,708.59)
test ar_word::hb                ... bench:      27,228.22 ns/iter (+/- 1,953.36)
test ar_word::rb                ... bench:      75,276.58 ns/iter (+/- 3,252.40)
test en_multiple_paragraphs::hb ... bench:     610,585.45 ns/iter (+/- 18,826.07)
test en_multiple_paragraphs::rb ... bench:   1,972,808.40 ns/iter (+/- 61,294.68)
test en_paragraph::hb           ... bench:      65,031.66 ns/iter (+/- 1,809.99)
test en_paragraph::rb           ... bench:     224,223.60 ns/iter (+/- 9,276.96)
test en_sentence::hb            ... bench:      30,761.11 ns/iter (+/- 1,950.26)
test en_sentence::rb            ... bench:      98,659.53 ns/iter (+/- 1,693.28)
test en_word::hb                ... bench:      14,743.31 ns/iter (+/- 559.45)
test en_word::rb                ... bench:      30,558.89 ns/iter (+/- 884.83)
test en_zalgo::hb               ... bench:      36,896.70 ns/iter (+/- 566.61)
test en_zalgo::rb               ... bench:      78,816.67 ns/iter (+/- 3,343.94)
test he_multiple_paragraphs::hb ... bench:     249,640.98 ns/iter (+/- 3,851.27)
test he_multiple_paragraphs::rb ... bench:     836,345.90 ns/iter (+/- 13,727.18)
test he_paragraph::hb           ... bench:      23,472.28 ns/iter (+/- 846.97)
test he_paragraph::rb           ... bench:      87,753.59 ns/iter (+/- 2,280.71)
test he_sentence::hb            ... bench:       9,002.00 ns/iter (+/- 204.81)
test he_sentence::rb            ... bench:      31,088.80 ns/iter (+/- 744.61)
test he_word::hb                ... bench:       5,206.77 ns/iter (+/- 208.45)
test he_word::rb                ... bench:       9,126.16 ns/iter (+/- 1,500.98)
test hi_multiple_paragraphs::hb ... bench:   2,065,858.35 ns/iter (+/- 49,065.39)
test hi_multiple_paragraphs::rb ... bench:   3,593,712.60 ns/iter (+/- 69,404.16)
test hi_paragraph::hb           ... bench:     232,789.84 ns/iter (+/- 4,791.20)
test hi_paragraph::rb           ... bench:     435,185.43 ns/iter (+/- 13,234.74)
test hi_sentence::hb            ... bench:      75,099.49 ns/iter (+/- 2,383.92)
test hi_sentence::rb            ... bench:     160,847.23 ns/iter (+/- 4,680.54)
test hi_word::hb                ... bench:      27,650.89 ns/iter (+/- 1,516.73)
test hi_word::rb                ... bench:      64,401.74 ns/iter (+/- 2,110.70)
test kh_multiple_paragraphs::hb ... bench:   1,682,833.35 ns/iter (+/- 36,860.19)
test kh_multiple_paragraphs::rb ... bench:   3,434,774.90 ns/iter (+/- 84,483.32)
test kh_paragraph::hb           ... bench:     166,819.10 ns/iter (+/- 4,580.99)
test kh_paragraph::rb           ... bench:     330,029.20 ns/iter (+/- 28,456.88)
test kh_sentence::hb            ... bench:      55,545.73 ns/iter (+/- 1,272.65)
test kh_sentence::rb            ... bench:     104,521.88 ns/iter (+/- 8,517.16)
test kh_word::hb                ... bench:      15,254.05 ns/iter (+/- 245.87)
test kh_word::rb                ... bench:      25,678.58 ns/iter (+/- 900.12)
test my_multiple_paragraphs::hb ... bench:   2,943,637.50 ns/iter (+/- 272,463.96)
test my_multiple_paragraphs::rb ... bench:   8,042,866.70 ns/iter (+/- 910,172.62)
test my_paragraph::hb           ... bench:     304,476.03 ns/iter (+/- 6,568.26)
test my_paragraph::rb           ... bench:     927,475.00 ns/iter (+/- 34,731.60)
test my_sentence::hb            ... bench:      81,085.92 ns/iter (+/- 1,414.87)
test my_sentence::rb            ... bench:     253,408.35 ns/iter (+/- 9,122.52)
test my_word::hb                ... bench:      15,085.29 ns/iter (+/- 220.00)
test my_word::rb                ... bench:      25,986.95 ns/iter (+/- 636.28)
test th_multiple_paragraphs::hb ... bench:     462,202.05 ns/iter (+/- 332,104.17)
test th_multiple_paragraphs::rb ... bench:   1,054,295.85 ns/iter (+/- 24,189.01)
test th_paragraph::hb           ... bench:      41,557.88 ns/iter (+/- 1,763.06)
test th_paragraph::rb           ... bench:     105,711.80 ns/iter (+/- 2,785.21)
test th_sentence::hb            ... bench:      15,400.77 ns/iter (+/- 331.18)
test th_sentence::rb            ... bench:      37,927.09 ns/iter (+/- 1,241.49)
test th_word::hb                ... bench:       5,521.20 ns/iter (+/- 143.39)
test th_word::rb                ... bench:      10,446.08 ns/iter (+/- 308.76)

But yeah I'm sure there are some low-hanging fruits.

RazrFalcon commented 1 day ago

Ugh... I was sure it simply links the system library. Then we could try sending patch to harfbuzz_rs, so it would have up to date version.

LaurenzV commented 1 day ago

I think there already is one https://github.com/harfbuzz/harfbuzz_rs/pull/37, although it also "only" targets 8.4.0. But I have my own branch with 9.0 that I will just use for now.

RazrFalcon commented 1 day ago

The crates.io version is 8.0.0, which is not that far.

LaurenzV commented 1 day ago

Yeah, but might as well use the newest version if available, no? 😄

@behdad Is there a chance to update harfbuzz_rs to the newest version? Not sure who is the main responsible for the crate.

behdad commented 1 day ago

@behdad Is there a chance to update harfbuzz_rs to the newest version? Not sure who is the main responsible for the crate.

I merged the 8.4.0 PR. I don't think anyone's working on it currently.

behdad commented 1 day ago

@behdad Is there a chance to update harfbuzz_rs to the newest version? Not sure who is the main responsible for the crate.

I merged the 8.4.0 PR. I don't think anyone's working on it currently.

https://github.com/harfbuzz/harfbuzz_rs/issues/41

LaurenzV commented 1 day ago

Thanks! Weird, 8.4.0 is still considerably slower than 9.0.0 for me, but I guess it'll do for now.

behdad commented 1 day ago

Thanks! Weird, 8.4.0 is still considerably slower than 9.0.0 for me, but I guess it'll do for now.

That's not expected.

LaurenzV commented 1 day ago

Ah! All good, I know what's going on, I had to enable the HARFBUZZ_SYS_NO_PKG_CONFIG environment variable so that harfbuzz_rs actually uses the submodule instead of the system library. And I guess I have a pretty old version somewhere. I'll add this to the README.

behdad commented 1 day ago

harfbuzz_rs now on 9.0.0.

asibahi commented 1 day ago

Note that for the linked submodule I had to disable CoreText from harfbuzz to make it compile. on my machine, as the build.rs file doesn't point to the correct system libraries.

LaurenzV commented 1 day ago

Yeah, no worries, I had exactly the same issue when trying it locally and it also worked when I disabled it.

LaurenzV commented 1 day ago

@RazrFalcon Better now? I think we have good coverage now. And we can always add more stuff later on.

RazrFalcon commented 1 day ago

Great work as always! A couple minor fixes and we're ready to merge.

RazrFalcon commented 1 day ago

As for performance, I'm actually surprised how well RB performs. Remember that unlike HB, RB is 100% memory safe. We have zero unsafe. And also, I spent literally zero hours optimizing it.

Also note that everything in ttf-parser, except GSUB/GPOS/GDEF is as fast as it could be. I've spent weeks optimizing it. Bug GGG tables... no, they just correct. And I'm not sure they can be optimized much, except parsing them into an allocated cache, which is not what HB does, afaik.

That's the problem with TrueType performance in general: you either get performance by not allocating anything, or waste RAM by allocating everything.

behdad commented 20 hours ago

@LaurenzV Can you paste the AAT comparison to HB? I'm curious.

LaurenzV commented 20 hours ago

Sure, that's what I get:

test english::aat_paragraph_long::hb  ... bench:     400,081.25 ns/iter (+/- 5,340.26)
test english::aat_paragraph_long::rb  ... bench:     503,114.55 ns/iter (+/- 7,702.48)
test english::aat_sentence_1::hb      ... bench:      55,949.58 ns/iter (+/- 1,311.55)
test english::aat_sentence_1::rb      ... bench:      61,661.58 ns/iter (+/- 2,269.01)
test english::aat_word_1::hb          ... bench:      10,424.01 ns/iter (+/- 271.68)
test english::aat_word_1::rb          ... bench:       4,430.10 ns/iter (+/- 101.12)
test hindi::aat_paragraph_long::hb    ... bench:   1,216,325.00 ns/iter (+/- 18,088.78)
test hindi::aat_paragraph_long::rb    ... bench:   1,341,831.26 ns/iter (+/- 14,252.76)
test hindi::aat_sentence::hb          ... bench:     171,463.18 ns/iter (+/- 4,939.79)
test hindi::aat_sentence::rb          ... bench:     169,386.98 ns/iter (+/- 4,923.60)
test hindi::aat_word::hb              ... bench:      36,415.48 ns/iter (+/- 1,015.80)
test hindi::aat_word::rb              ... bench:      23,476.31 ns/iter (+/- 958.54)
test khmer::aat_paragraph_long_1::hb  ... bench:     552,237.45 ns/iter (+/- 11,561.88)
test khmer::aat_paragraph_long_1::rb  ... bench:     729,897.90 ns/iter (+/- 26,071.25)
test khmer::aat_sentence_1::hb        ... bench:      44,000.35 ns/iter (+/- 821.56)
test khmer::aat_sentence_1::rb        ... bench:      68,394.91 ns/iter (+/- 3,654.46)
test khmer::aat_word_1::hb            ... bench:      11,341.94 ns/iter (+/- 245.54)
test khmer::aat_word_1::rb            ... bench:      13,300.99 ns/iter (+/- 481.35)
test myanmar::aat_paragraph_long::hb  ... bench:     886,429.20 ns/iter (+/- 21,946.05)
test myanmar::aat_paragraph_long::rb  ... bench:   2,467,400.00 ns/iter (+/- 52,136.80)
test myanmar::aat_sentence_1::hb      ... bench:      69,138.88 ns/iter (+/- 3,477.55)
test myanmar::aat_sentence_1::rb      ... bench:      94,925.00 ns/iter (+/- 9,313.67)
test myanmar::aat_word_1::hb          ... bench:       8,613.54 ns/iter (+/- 410.36)
test myanmar::aat_word_1::rb          ... bench:       9,656.10 ns/iter (+/- 159.67)

RazrFalcon / rustybuzz

Add more comprehensive benchmarking #120