go-text / typesetting

High quality text shaping in pure Go.
Other
88 stars 11 forks source link

Use an iterator-style API to consume shaped text in line wrapper #68

Closed whereswaldon closed 1 year ago

whereswaldon commented 1 year ago

This PR is part of my ongoing effort to refactor the line wrapper with the goal of making it both easier to understand and able to support additional features.

We've long discussed wanting to be able to make text shaping an incremental process instead of having to do it all up front. Having and API that works in terms of iterators makes that much easier. As such, I've restructured the line wrapper to consume an iterator API that is expected to provide shaped text. I've only implemented an iterator backed by a pre-shaped slice for now, but it shouldn't be terribly hard to implement a lazy-shaping iterator with a compatible API.

However, introducing an iterator complicated some shaping operations and triggered some extra allocations (we couldn't just reach into the slice storage of the shaped text arbitrarily anymore), which caused a nontrivial performance decrease.


This section is obsolete, but left for reference; skip to the next horizontal rule.

To make up for this, I've introduced WrapBuffer, an optional buffer types that callers can provide to the line wrapper to allow it to avoid allocating. Everything will work as expected if callers provide the zero value, but providing a non-zero buffer will potentially eliminate allocations altogether if the wrapping process is simple enough.

It has a rather dramatic impact on performance across the board, in many places improving on the pre-iterator implementation's performance and memory use.

I know that this PR makes quite a few API changes, but:

Below are my benchmarks:

Benchmark results
goos: linux
goarch: amd64
pkg: github.com/go-text/typesetting/shaping
cpu: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                                 │ before-iterator.txt │             iterator.txt             │        after-buffering.txt         │
                                                 │       sec/op        │    sec/op     vs base                │   sec/op     vs base               │
Wrapping/10runes-arabic-1parts-8                           39.01n ± 4%   157.00n ± 4%  +302.51% (p=0.002 n=6)   45.41n ± 4%  +16.43% (p=0.002 n=6)
Wrapping/10runes-arabic-10parts-8                          4.005µ ± 2%    4.619µ ± 4%   +15.35% (p=0.002 n=6)   2.648µ ± 2%  -33.87% (p=0.002 n=6)
Wrapping/100runes-arabic-1parts-8                          18.47µ ± 2%    19.62µ ± 3%    +6.22% (p=0.002 n=6)   17.51µ ± 1%   -5.22% (p=0.002 n=6)
Wrapping/100runes-arabic-10parts-8                         20.81µ ± 2%    22.09µ ± 4%    +6.15% (p=0.002 n=6)   18.54µ ± 1%  -10.93% (p=0.002 n=6)
Wrapping/100runes-arabic-100parts-8                        36.37µ ± 3%    43.20µ ± 3%   +18.76% (p=0.002 n=6)   23.13µ ± 4%  -36.41% (p=0.002 n=6)
Wrapping/1000runes-arabic-1parts-8                         174.7µ ± 2%    186.2µ ± 3%    +6.59% (p=0.002 n=6)   169.4µ ± 3%   -2.99% (p=0.002 n=6)
Wrapping/1000runes-arabic-10parts-8                        180.7µ ± 3%    191.9µ ± 3%    +6.22% (p=0.004 n=6)   170.2µ ± 1%   -5.82% (p=0.002 n=6)
Wrapping/1000runes-arabic-100parts-8                       204.0µ ± 1%    219.4µ ± 3%    +7.53% (p=0.002 n=6)   176.9µ ± 1%  -13.32% (p=0.002 n=6)
Wrapping/1000runes-arabic-1000parts-8                      362.9µ ± 3%    427.0µ ± 3%   +17.65% (p=0.002 n=6)   223.1µ ± 2%  -38.53% (p=0.002 n=6)
Wrapping/10runes-latin-1parts-8                            38.40n ± 4%   151.20n ± 5%  +293.70% (p=0.002 n=6)   45.29n ± 2%  +17.91% (p=0.002 n=6)
Wrapping/10runes-latin-10parts-8                           3.452µ ± 3%    4.003µ ± 2%   +15.96% (p=0.002 n=6)   2.250µ ± 2%  -34.82% (p=0.002 n=6)
Wrapping/100runes-latin-1parts-8                           15.37µ ± 4%    16.31µ ± 4%    +6.08% (p=0.004 n=6)   14.16µ ± 1%   -7.87% (p=0.002 n=6)
Wrapping/100runes-latin-10parts-8                          17.32µ ± 5%    18.97µ ± 2%    +9.50% (p=0.002 n=6)   15.34µ ± 5%  -11.45% (p=0.002 n=6)
Wrapping/100runes-latin-100parts-8                         32.70µ ± 2%    40.58µ ± 3%   +24.10% (p=0.002 n=6)   20.86µ ± 4%  -36.19% (p=0.002 n=6)
Wrapping/1000runes-latin-1parts-8                          143.7µ ± 7%    155.0µ ± 3%    +7.87% (p=0.009 n=6)   137.1µ ± 6%   -4.61% (p=0.041 n=6)
Wrapping/1000runes-latin-10parts-8                         156.4µ ± 3%    164.9µ ± 3%    +5.43% (p=0.002 n=6)   145.1µ ± 1%   -7.22% (p=0.002 n=6)
Wrapping/1000runes-latin-100parts-8                        176.0µ ± 3%    190.9µ ± 7%    +8.49% (p=0.002 n=6)   153.2µ ± 3%  -12.93% (p=0.002 n=6)
Wrapping/1000runes-latin-1000parts-8                       323.5µ ± 4%    404.9µ ± 3%   +25.13% (p=0.002 n=6)   208.4µ ± 3%  -35.60% (p=0.002 n=6)
WrappingHappyPath-8                                        90.54n ± 4%   203.20n ± 4%  +124.43% (p=0.002 n=6)   44.73n ± 3%  -50.59% (p=0.002 n=6)
Wrapping/10runes-arabic-1parts-unbuffered-8                                                                     2.373µ ± 5%
Wrapping/10runes-arabic-10parts-unbuffered-8                                                                    5.208µ ± 2%
Wrapping/100runes-arabic-1parts-unbuffered-8                                                                    20.50µ ± 1%
Wrapping/100runes-arabic-10parts-unbuffered-8                                                                   21.24µ ± 1%
Wrapping/100runes-arabic-100parts-unbuffered-8                                                                  26.57µ ± 1%
Wrapping/1000runes-arabic-1parts-unbuffered-8                                                                   172.6µ ± 2%
Wrapping/1000runes-arabic-10parts-unbuffered-8                                                                  176.0µ ± 2%
Wrapping/1000runes-arabic-100parts-unbuffered-8                                                                 192.3µ ± 1%
Wrapping/1000runes-arabic-1000parts-unbuffered-8                                                                283.2µ ± 3%
Wrapping/10runes-latin-1parts-unbuffered-8                                                                      2.348µ ± 3%
Wrapping/10runes-latin-10parts-unbuffered-8                                                                     4.849µ ± 2%
Wrapping/100runes-latin-1parts-unbuffered-8                                                                     16.79µ ± 3%
Wrapping/100runes-latin-10parts-unbuffered-8                                                                    18.00µ ± 1%
Wrapping/100runes-latin-100parts-unbuffered-8                                                                   25.64µ ± 1%
Wrapping/1000runes-latin-1parts-unbuffered-8                                                                    145.8µ ± 1%
Wrapping/1000runes-latin-10parts-unbuffered-8                                                                   152.5µ ± 2%
Wrapping/1000runes-latin-100parts-unbuffered-8                                                                  166.7µ ± 2%
Wrapping/1000runes-latin-1000parts-unbuffered-8                                                                 250.2µ ± 2%
geomean                                                    17.97µ         23.76µ        +32.27%                 23.05µ       -18.66%

                                                 │ before-iterator.txt │             iterator.txt              │           after-buffering.txt            │
                                                 │        B/op         │     B/op       vs base                │     B/op      vs base                    │
Wrapping/10runes-arabic-1parts-8                            24.00 ± 0%     184.00 ± 0%  +666.67% (p=0.002 n=6)      0.00 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-arabic-10parts-8                          4680.0 ± 0%     4744.0 ± 0%    +1.37% (p=0.002 n=6)     144.0 ± 0%   -96.92% (p=0.002 n=6)
Wrapping/100runes-arabic-1parts-8                          1272.0 ± 0%     1336.0 ± 0%    +5.03% (p=0.002 n=6)     144.0 ± 0%   -88.68% (p=0.002 n=6)
Wrapping/100runes-arabic-10parts-8                         4264.0 ± 0%     4328.0 ± 0%    +1.50% (p=0.002 n=6)     144.0 ± 0%   -96.62% (p=0.002 n=6)
Wrapping/100runes-arabic-100parts-8                       49096.0 ± 0%    49160.0 ± 0%    +0.13% (p=0.002 n=6)     144.0 ± 0%   -99.71% (p=0.002 n=6)
Wrapping/1000runes-arabic-1parts-8                        13657.0 ± 0%    13722.0 ± 0%    +0.48% (p=0.002 n=6)     146.0 ± 0%   -98.93% (p=0.002 n=6)
Wrapping/1000runes-arabic-10parts-8                       15928.0 ± 0%    15993.0 ± 0%    +0.41% (p=0.002 n=6)     145.0 ± 0%   -99.09% (p=0.002 n=6)
Wrapping/1000runes-arabic-100parts-8                      45513.0 ± 0%    45577.0 ± 0%    +0.14% (p=0.002 n=6)     149.0 ± 0%   -99.67% (p=0.002 n=6)
Wrapping/1000runes-arabic-1000parts-8                    537242.5 ± 0%   537308.5 ± 0%    +0.01% (p=0.002 n=6)     263.5 ± 1%   -99.95% (p=0.002 n=6)
Wrapping/10runes-latin-1parts-8                             24.00 ± 0%     184.00 ± 0%  +666.67% (p=0.002 n=6)      0.00 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-latin-10parts-8                           4376.0 ± 0%     4440.0 ± 0%    +1.46% (p=0.002 n=6)     144.0 ± 0%   -96.71% (p=0.002 n=6)
Wrapping/100runes-latin-1parts-8                           1848.0 ± 0%     1912.0 ± 0%    +3.46% (p=0.002 n=6)     144.0 ± 0%   -92.21% (p=0.002 n=6)
Wrapping/100runes-latin-10parts-8                          4600.0 ± 0%     4664.0 ± 0%    +1.39% (p=0.002 n=6)     144.0 ± 0%   -96.87% (p=0.002 n=6)
Wrapping/100runes-latin-100parts-8                        52440.0 ± 0%    52504.0 ± 0%    +0.12% (p=0.002 n=6)     144.0 ± 0%   -99.73% (p=0.002 n=6)
Wrapping/1000runes-latin-1parts-8                         16825.0 ± 0%    16889.0 ± 0%    +0.38% (p=0.002 n=6)     148.0 ± 0%   -99.12% (p=0.002 n=6)
Wrapping/1000runes-latin-10parts-8                        19096.0 ± 0%    19161.0 ± 0%    +0.34% (p=0.002 n=6)     148.0 ± 0%   -99.22% (p=0.002 n=6)
Wrapping/1000runes-latin-100parts-8                       46456.5 ± 0%    46521.0 ± 0%    +0.14% (p=0.002 n=6)     149.0 ± 0%   -99.68% (p=0.002 n=6)
Wrapping/1000runes-latin-1000parts-8                     526427.5 ± 0%   526492.0 ± 0%    +0.01% (p=0.002 n=6)     235.0 ± 6%   -99.96% (p=0.002 n=6)
WrappingHappyPath-8                                         120.0 ± 0%      280.0 ± 0%  +133.33% (p=0.002 n=6)       0.0 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-arabic-1parts-unbuffered-8                                                                      10.48Ki ± 0%
Wrapping/10runes-arabic-10parts-unbuffered-8                                                                     10.62Ki ± 0%
Wrapping/100runes-arabic-1parts-unbuffered-8                                                                     10.62Ki ± 0%
Wrapping/100runes-arabic-10parts-unbuffered-8                                                                    10.62Ki ± 0%
Wrapping/100runes-arabic-100parts-unbuffered-8                                                                   12.38Ki ± 0%
Wrapping/1000runes-arabic-1parts-unbuffered-8                                                                    14.10Ki ± 0%
Wrapping/1000runes-arabic-10parts-unbuffered-8                                                                   14.09Ki ± 0%
Wrapping/1000runes-arabic-100parts-unbuffered-8                                                                  26.25Ki ± 0%
Wrapping/1000runes-arabic-1000parts-unbuffered-8                                                                 159.5Ki ± 0%
Wrapping/10runes-latin-1parts-unbuffered-8                                                                       10.48Ki ± 0%
Wrapping/10runes-latin-10parts-unbuffered-8                                                                      10.62Ki ± 0%
Wrapping/100runes-latin-1parts-unbuffered-8                                                                      10.62Ki ± 0%
Wrapping/100runes-latin-10parts-unbuffered-8                                                                     10.62Ki ± 0%
Wrapping/100runes-latin-100parts-unbuffered-8                                                                    16.38Ki ± 0%
Wrapping/1000runes-latin-1parts-unbuffered-8                                                                     19.31Ki ± 0%
Wrapping/1000runes-latin-10parts-unbuffered-8                                                                    20.52Ki ± 0%
Wrapping/1000runes-latin-100parts-unbuffered-8                                                                   28.98Ki ± 0%
Wrapping/1000runes-latin-1000parts-unbuffered-8                                                                  123.6Ki ± 0%
geomean                                                   6.664Ki         8.708Ki        +30.67%                               ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                                                 │ before-iterator.txt │             iterator.txt             │          after-buffering.txt           │
                                                 │      allocs/op      │  allocs/op    vs base                │ allocs/op   vs base                    │
Wrapping/10runes-arabic-1parts-8                            1.000 ± 0%     3.000 ± 0%  +200.00% (p=0.002 n=6)   0.000 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-arabic-10parts-8                          12.000 ± 0%    13.000 ± 0%    +8.33% (p=0.002 n=6)   3.000 ± 0%   -75.00% (p=0.002 n=6)
Wrapping/100runes-arabic-1parts-8                          15.000 ± 0%    16.000 ± 0%    +6.67% (p=0.002 n=6)   3.000 ± 0%   -80.00% (p=0.002 n=6)
Wrapping/100runes-arabic-10parts-8                         33.000 ± 0%    34.000 ± 0%    +3.03% (p=0.002 n=6)   3.000 ± 0%   -90.91% (p=0.002 n=6)
Wrapping/100runes-arabic-100parts-8                        73.000 ± 0%    74.000 ± 0%    +1.37% (p=0.002 n=6)   3.000 ± 0%   -95.89% (p=0.002 n=6)
Wrapping/1000runes-arabic-1parts-8                         88.000 ± 0%    89.000 ± 0%    +1.14% (p=0.002 n=6)   3.000 ± 0%   -96.59% (p=0.002 n=6)
Wrapping/1000runes-arabic-10parts-8                       105.000 ± 0%   106.000 ± 0%    +0.95% (p=0.002 n=6)   3.000 ± 0%   -97.14% (p=0.002 n=6)
Wrapping/1000runes-arabic-100parts-8                      277.000 ± 0%   278.000 ± 0%    +0.36% (p=0.002 n=6)   3.000 ± 0%   -98.92% (p=0.002 n=6)
Wrapping/1000runes-arabic-1000parts-8                     633.000 ± 0%   634.000 ± 0%    +0.16% (p=0.002 n=6)   3.000 ± 0%   -99.53% (p=0.002 n=6)
Wrapping/10runes-latin-1parts-8                             1.000 ± 0%     3.000 ± 0%  +200.00% (p=0.002 n=6)   0.000 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-latin-10parts-8                           11.000 ± 0%    12.000 ± 0%    +9.09% (p=0.002 n=6)   3.000 ± 0%   -72.73% (p=0.002 n=6)
Wrapping/100runes-latin-1parts-8                           18.000 ± 0%    19.000 ± 0%    +5.56% (p=0.002 n=6)   3.000 ± 0%   -83.33% (p=0.002 n=6)
Wrapping/100runes-latin-10parts-8                          34.000 ± 0%    35.000 ± 0%    +2.94% (p=0.002 n=6)   3.000 ± 0%   -91.18% (p=0.002 n=6)
Wrapping/100runes-latin-100parts-8                         77.000 ± 0%    78.000 ± 0%    +1.30% (p=0.002 n=6)   3.000 ± 0%   -96.10% (p=0.002 n=6)
Wrapping/1000runes-latin-1parts-8                         121.000 ± 0%   122.000 ± 0%    +0.83% (p=0.002 n=6)   3.000 ± 0%   -97.52% (p=0.002 n=6)
Wrapping/1000runes-latin-10parts-8                        138.000 ± 0%   139.000 ± 0%    +0.72% (p=0.002 n=6)   3.000 ± 0%   -97.83% (p=0.002 n=6)
Wrapping/1000runes-latin-100parts-8                       305.000 ± 0%   306.000 ± 0%    +0.33% (p=0.002 n=6)   3.000 ± 0%   -99.02% (p=0.002 n=6)
Wrapping/1000runes-latin-1000parts-8                      743.000 ± 0%   744.000 ± 0%    +0.13% (p=0.002 n=6)   3.000 ± 0%   -99.60% (p=0.002 n=6)
WrappingHappyPath-8                                         2.000 ± 0%     4.000 ± 0%  +100.00% (p=0.002 n=6)   0.000 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-arabic-1parts-unbuffered-8                                                                     4.000 ± 0%
Wrapping/10runes-arabic-10parts-unbuffered-8                                                                    7.000 ± 0%
Wrapping/100runes-arabic-1parts-unbuffered-8                                                                    7.000 ± 0%
Wrapping/100runes-arabic-10parts-unbuffered-8                                                                   7.000 ± 0%
Wrapping/100runes-arabic-100parts-unbuffered-8                                                                  8.000 ± 0%
Wrapping/1000runes-arabic-1parts-unbuffered-8                                                                   10.00 ± 0%
Wrapping/1000runes-arabic-10parts-unbuffered-8                                                                  10.00 ± 0%
Wrapping/1000runes-arabic-100parts-unbuffered-8                                                                 85.00 ± 0%
Wrapping/1000runes-arabic-1000parts-unbuffered-8                                                                172.0 ± 0%
Wrapping/10runes-latin-1parts-unbuffered-8                                                                      4.000 ± 0%
Wrapping/10runes-latin-10parts-unbuffered-8                                                                     7.000 ± 0%
Wrapping/100runes-latin-1parts-unbuffered-8                                                                     7.000 ± 0%
Wrapping/100runes-latin-10parts-unbuffered-8                                                                    7.000 ± 0%
Wrapping/100runes-latin-100parts-unbuffered-8                                                                   9.000 ± 0%
Wrapping/1000runes-latin-1parts-unbuffered-8                                                                    24.00 ± 0%
Wrapping/1000runes-latin-10parts-unbuffered-8                                                                   36.00 ± 0%
Wrapping/1000runes-latin-100parts-unbuffered-8                                                                  83.00 ± 0%
Wrapping/1000runes-latin-1000parts-unbuffered-8                                                                 146.0 ± 0%
geomean                                                     40.01          47.61        +19.01%                             ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

As you can see, on the whole CPU performance is improved and allocations are dramatically improved. We did lose some performance when wrapping tiny single runs because the iterator API makes it more expensive to check whether we're looking at a tiny single run. I hope to revisit this in the future to make this hot path more performant.


I've internalized the existence of WrapBuffer so that the only API change is behavioral: it is no longer valid to keep using returned wrapped lines after the next call to Prepare() or WrapParagraph().

Here are updated benchmark results:

benchmarks
goos: linux
goarch: amd64
pkg: github.com/go-text/typesetting/shaping
cpu: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
                                      │ ./shaping/before-iterator.txt │         shaping/iterator.txt         │             final.txt              │
                                      │            sec/op             │    sec/op     vs base                │   sec/op     vs base               │
Wrapping/10runes-arabic-1parts-8                          39.01n ± 4%   157.00n ± 4%  +302.51% (p=0.002 n=6)   45.61n ± 4%  +16.93% (p=0.002 n=6)
Wrapping/10runes-arabic-10parts-8                         4.005µ ± 2%    4.619µ ± 4%   +15.35% (p=0.002 n=6)   2.640µ ± 1%  -34.07% (p=0.002 n=6)
Wrapping/100runes-arabic-1parts-8                         18.47µ ± 2%    19.62µ ± 3%    +6.22% (p=0.002 n=6)   17.40µ ± 1%   -5.81% (p=0.002 n=6)
Wrapping/100runes-arabic-10parts-8                        20.81µ ± 2%    22.09µ ± 4%    +6.15% (p=0.002 n=6)   18.11µ ± 1%  -12.97% (p=0.002 n=6)
Wrapping/100runes-arabic-100parts-8                       36.37µ ± 3%    43.20µ ± 3%   +18.76% (p=0.002 n=6)   22.54µ ± 6%  -38.04% (p=0.002 n=6)
Wrapping/1000runes-arabic-1parts-8                        174.7µ ± 2%    186.2µ ± 3%    +6.59% (p=0.002 n=6)   165.6µ ± 0%   -5.16% (p=0.002 n=6)
Wrapping/1000runes-arabic-10parts-8                       180.7µ ± 3%    191.9µ ± 3%    +6.22% (p=0.004 n=6)   169.5µ ± 6%   -6.19% (p=0.009 n=6)
Wrapping/1000runes-arabic-100parts-8                      204.0µ ± 1%    219.4µ ± 3%    +7.53% (p=0.002 n=6)   175.9µ ± 4%  -13.81% (p=0.002 n=6)
Wrapping/1000runes-arabic-1000parts-8                     362.9µ ± 3%    427.0µ ± 3%   +17.65% (p=0.002 n=6)   220.4µ ± 2%  -39.27% (p=0.002 n=6)
Wrapping/10runes-latin-1parts-8                           38.40n ± 4%   151.20n ± 5%  +293.70% (p=0.002 n=6)   45.32n ± 5%  +17.99% (p=0.002 n=6)
Wrapping/10runes-latin-10parts-8                          3.452µ ± 3%    4.003µ ± 2%   +15.96% (p=0.002 n=6)   2.212µ ± 1%  -35.92% (p=0.002 n=6)
Wrapping/100runes-latin-1parts-8                          15.37µ ± 4%    16.31µ ± 4%    +6.08% (p=0.004 n=6)   13.67µ ± 1%  -11.08% (p=0.002 n=6)
Wrapping/100runes-latin-10parts-8                         17.32µ ± 5%    18.97µ ± 2%    +9.50% (p=0.002 n=6)   14.79µ ± 1%  -14.62% (p=0.002 n=6)
Wrapping/100runes-latin-100parts-8                        32.70µ ± 2%    40.58µ ± 3%   +24.10% (p=0.002 n=6)   19.84µ ± 0%  -39.33% (p=0.002 n=6)
Wrapping/1000runes-latin-1parts-8                         143.7µ ± 7%    155.0µ ± 3%    +7.87% (p=0.009 n=6)   134.6µ ± 0%   -6.38% (p=0.002 n=6)
Wrapping/1000runes-latin-10parts-8                        156.4µ ± 3%    164.9µ ± 3%    +5.43% (p=0.002 n=6)   139.9µ ± 1%  -10.54% (p=0.002 n=6)
Wrapping/1000runes-latin-100parts-8                       176.0µ ± 3%    190.9µ ± 7%    +8.49% (p=0.002 n=6)   147.6µ ± 1%  -16.12% (p=0.002 n=6)
Wrapping/1000runes-latin-1000parts-8                      323.5µ ± 4%    404.9µ ± 3%   +25.13% (p=0.002 n=6)   199.5µ ± 1%  -38.33% (p=0.002 n=6)
WrappingHappyPath-8                                       90.54n ± 4%   203.20n ± 4%  +124.43% (p=0.002 n=6)   47.42n ± 7%  -47.63% (p=0.002 n=6)
geomean                                                   17.97µ         23.76µ        +32.27%                 14.37µ       -20.00%

                                      │ ./shaping/before-iterator.txt │         shaping/iterator.txt          │               final.txt                │
                                      │             B/op              │     B/op       vs base                │    B/op     vs base                    │
Wrapping/10runes-arabic-1parts-8                           24.00 ± 0%     184.00 ± 0%  +666.67% (p=0.002 n=6)    0.00 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-arabic-10parts-8                         4680.0 ± 0%     4744.0 ± 0%    +1.37% (p=0.002 n=6)   144.0 ± 0%   -96.92% (p=0.002 n=6)
Wrapping/100runes-arabic-1parts-8                         1272.0 ± 0%     1336.0 ± 0%    +5.03% (p=0.002 n=6)   144.0 ± 0%   -88.68% (p=0.002 n=6)
Wrapping/100runes-arabic-10parts-8                        4264.0 ± 0%     4328.0 ± 0%    +1.50% (p=0.002 n=6)   144.0 ± 0%   -96.62% (p=0.002 n=6)
Wrapping/100runes-arabic-100parts-8                      49096.0 ± 0%    49160.0 ± 0%    +0.13% (p=0.002 n=6)   144.0 ± 0%   -99.71% (p=0.002 n=6)
Wrapping/1000runes-arabic-1parts-8                       13657.0 ± 0%    13722.0 ± 0%    +0.48% (p=0.002 n=6)   147.0 ± 0%   -98.92% (p=0.002 n=6)
Wrapping/1000runes-arabic-10parts-8                      15928.0 ± 0%    15993.0 ± 0%    +0.41% (p=0.002 n=6)   146.0 ± 0%   -99.08% (p=0.002 n=6)
Wrapping/1000runes-arabic-100parts-8                     45513.0 ± 0%    45577.0 ± 0%    +0.14% (p=0.002 n=6)   151.0 ± 0%   -99.67% (p=0.002 n=6)
Wrapping/1000runes-arabic-1000parts-8                   537242.5 ± 0%   537308.5 ± 0%    +0.01% (p=0.002 n=6)   264.0 ± 4%   -99.95% (p=0.002 n=6)
Wrapping/10runes-latin-1parts-8                            24.00 ± 0%     184.00 ± 0%  +666.67% (p=0.002 n=6)    0.00 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-latin-10parts-8                          4376.0 ± 0%     4440.0 ± 0%    +1.46% (p=0.002 n=6)   144.0 ± 0%   -96.71% (p=0.002 n=6)
Wrapping/100runes-latin-1parts-8                          1848.0 ± 0%     1912.0 ± 0%    +3.46% (p=0.002 n=6)   144.0 ± 0%   -92.21% (p=0.002 n=6)
Wrapping/100runes-latin-10parts-8                         4600.0 ± 0%     4664.0 ± 0%    +1.39% (p=0.002 n=6)   144.0 ± 0%   -96.87% (p=0.002 n=6)
Wrapping/100runes-latin-100parts-8                       52440.0 ± 0%    52504.0 ± 0%    +0.12% (p=0.002 n=6)   144.0 ± 0%   -99.73% (p=0.002 n=6)
Wrapping/1000runes-latin-1parts-8                        16825.0 ± 0%    16889.0 ± 0%    +0.38% (p=0.002 n=6)   149.0 ± 0%   -99.11% (p=0.002 n=6)
Wrapping/1000runes-latin-10parts-8                       19096.0 ± 0%    19161.0 ± 0%    +0.34% (p=0.002 n=6)   149.0 ± 0%   -99.22% (p=0.002 n=6)
Wrapping/1000runes-latin-100parts-8                      46456.5 ± 0%    46521.0 ± 0%    +0.14% (p=0.002 n=6)   150.0 ± 0%   -99.68% (p=0.002 n=6)
Wrapping/1000runes-latin-1000parts-8                    526427.5 ± 0%   526492.0 ± 0%    +0.01% (p=0.002 n=6)   235.5 ± 3%   -99.96% (p=0.002 n=6)
WrappingHappyPath-8                                        120.0 ± 0%      280.0 ± 0%  +133.33% (p=0.002 n=6)     0.0 ± 0%  -100.00% (p=0.002 n=6)
geomean                                                  6.664Ki         8.708Ki        +30.67%                             ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

                                      │ ./shaping/before-iterator.txt │         shaping/iterator.txt         │               final.txt                │
                                      │           allocs/op           │  allocs/op    vs base                │ allocs/op   vs base                    │
Wrapping/10runes-arabic-1parts-8                           1.000 ± 0%     3.000 ± 0%  +200.00% (p=0.002 n=6)   0.000 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-arabic-10parts-8                         12.000 ± 0%    13.000 ± 0%    +8.33% (p=0.002 n=6)   3.000 ± 0%   -75.00% (p=0.002 n=6)
Wrapping/100runes-arabic-1parts-8                         15.000 ± 0%    16.000 ± 0%    +6.67% (p=0.002 n=6)   3.000 ± 0%   -80.00% (p=0.002 n=6)
Wrapping/100runes-arabic-10parts-8                        33.000 ± 0%    34.000 ± 0%    +3.03% (p=0.002 n=6)   3.000 ± 0%   -90.91% (p=0.002 n=6)
Wrapping/100runes-arabic-100parts-8                       73.000 ± 0%    74.000 ± 0%    +1.37% (p=0.002 n=6)   3.000 ± 0%   -95.89% (p=0.002 n=6)
Wrapping/1000runes-arabic-1parts-8                        88.000 ± 0%    89.000 ± 0%    +1.14% (p=0.002 n=6)   3.000 ± 0%   -96.59% (p=0.002 n=6)
Wrapping/1000runes-arabic-10parts-8                      105.000 ± 0%   106.000 ± 0%    +0.95% (p=0.002 n=6)   3.000 ± 0%   -97.14% (p=0.002 n=6)
Wrapping/1000runes-arabic-100parts-8                     277.000 ± 0%   278.000 ± 0%    +0.36% (p=0.002 n=6)   3.000 ± 0%   -98.92% (p=0.002 n=6)
Wrapping/1000runes-arabic-1000parts-8                    633.000 ± 0%   634.000 ± 0%    +0.16% (p=0.002 n=6)   3.000 ± 0%   -99.53% (p=0.002 n=6)
Wrapping/10runes-latin-1parts-8                            1.000 ± 0%     3.000 ± 0%  +200.00% (p=0.002 n=6)   0.000 ± 0%  -100.00% (p=0.002 n=6)
Wrapping/10runes-latin-10parts-8                          11.000 ± 0%    12.000 ± 0%    +9.09% (p=0.002 n=6)   3.000 ± 0%   -72.73% (p=0.002 n=6)
Wrapping/100runes-latin-1parts-8                          18.000 ± 0%    19.000 ± 0%    +5.56% (p=0.002 n=6)   3.000 ± 0%   -83.33% (p=0.002 n=6)
Wrapping/100runes-latin-10parts-8                         34.000 ± 0%    35.000 ± 0%    +2.94% (p=0.002 n=6)   3.000 ± 0%   -91.18% (p=0.002 n=6)
Wrapping/100runes-latin-100parts-8                        77.000 ± 0%    78.000 ± 0%    +1.30% (p=0.002 n=6)   3.000 ± 0%   -96.10% (p=0.002 n=6)
Wrapping/1000runes-latin-1parts-8                        121.000 ± 0%   122.000 ± 0%    +0.83% (p=0.002 n=6)   3.000 ± 0%   -97.52% (p=0.002 n=6)
Wrapping/1000runes-latin-10parts-8                       138.000 ± 0%   139.000 ± 0%    +0.72% (p=0.002 n=6)   3.000 ± 0%   -97.83% (p=0.002 n=6)
Wrapping/1000runes-latin-100parts-8                      305.000 ± 0%   306.000 ± 0%    +0.33% (p=0.002 n=6)   3.000 ± 0%   -99.02% (p=0.002 n=6)
Wrapping/1000runes-latin-1000parts-8                     743.000 ± 0%   744.000 ± 0%    +0.13% (p=0.002 n=6)   3.000 ± 0%   -99.60% (p=0.002 n=6)
WrappingHappyPath-8                                        2.000 ± 0%     4.000 ± 0%  +100.00% (p=0.002 n=6)   0.000 ± 0%  -100.00% (p=0.002 n=6)
geomean                                                    40.01          47.61        +19.01%                             ?                      ¹ ²
¹ summaries must be >0 to compute geomean
² ratios must be >0 to compute geomean

Note that we're still losing a few nanoseconds on the trivial case due to the overhead of having an iterator interface, but we gain significant performance boosts on every other benchmark.

whereswaldon commented 1 year ago

Please review the iterator portion of this changeset, but hold off on reviewing the allocation optimization. I couldn't sleep tonight because I thought of a much better way to structure that, and it's super promising. I need a little more time to get it right though. I will update the PR and benchmarks once I'm happy with it.

At a high level, the current optimization exposes too many details to the caller (and asks them to understand them) and isn't able to adaptively resize the pre-allocated buffers depending upon workload. I'm internalizing everything but the existence of a buffer type and making it grow adaptively when the wrapping workload consumes all resources. We can tune the growth strategy without changing the public API, which seems like a much better place to be. This strategy is able to almost completely eliminate allocations when wrapping multiple large texts consecutively, as the buffer growth for wrapping the first one will generally be sufficient to handle the second.

whereswaldon commented 1 year ago

Okay, I've updated this branch with my latest work. I decided that separating out #69 didn't really add as much value as I hoped, so those changes are incorporated here as well. I hope that the result is much easier to understand. I've updated the PR description with latest benchmark results. On the whole, this is a perf win and a feature win, but we do lose some performance in places.

andydotxyz commented 1 year ago

Is there any sensible default we can use for buffering (without hitting the "stop until buffer copied" issue you raised above)? It feels like having an API that is less performant by default is a step backwards. I don't disagree that these are all desirable improvements - but I wonder if we can retain the default performance from before in some way?

whereswaldon commented 1 year ago

A couple of comments on naming. It all looks sensible, but I cannot figure out how the wrap buffer is worked with - there is no public API for accessing, copying, or resetting its data. Are we just to use the objects as a black box and manage as many of them as we need?

Precisely. They are opaque to users of the package, but exposed in order to allow callers to control whether they invalidate the last lines returned from the wrapper by reusing the buffer. If we don't expose this concept, callers cannot wrap a new paragraph until they are done with the previous (is that an okay constraint? I thought not). If we don't allow this kind of buffer reuse then we allocate quite a lot in wrapping each time.

whereswaldon commented 1 year ago

Is there any sensible default we can use for buffering (without hitting the "stop until buffer copied" issue you raised above)? It feels like having an API that is less performant by default is a step backwards. I don't disagree that these are all desirable improvements - but I wonder if we can retain the default performance from before in some way?

I'm not sure what "stop until buffer copied" issue you mean. I just reread the conversation above and I didn't see anything that seemed to match.

We can retain more performance by complicating the iterator API, but I opted to keep it simple in this iteration. Allowing the wrapper to ask "is there only one run?" allows us to keep performance on par.

benoitkugler commented 1 year ago

Regarding the buffer, could we merge it with the LineWrapper type, and document that the lines returned by Wrap are owned by the line wrapper ? Users who need multiple wrapping would create multiple LineWrappers. And we could completely hide the scratch type from the API, and have buffered, performant behavior by default.

(Actually, it is basically the alternative @whereswaldon has initially proposed)

whereswaldon commented 1 year ago

I've internalized the existence of WrapBuffer as @benoitkugler suggested and tried to address all of @andydotxyz's recommendations regarding naming and style. Performance is essentially unchanged, though there is no longer any need to benchmark unbuffered use of the API because the buffering is internal and automatic.

benoitkugler commented 1 year ago

Thank you for the changes ! Seems great to me.

benoitkugler commented 1 year ago

Late remark : since wrapBuffer is now hidden, we could probably remove the pointer indirection (in the LineWrapper.scratch field), and also simplify ensureScratch that could just be replaced by scratch.reset.

@whereswaldon I can commit such changes if you want.

whereswaldon commented 1 year ago

I've implemented that change, with some extra logic to make sure that the first time we reset the scratch buffer it allocates some space so that the first use of the line wrapper isn't extra-slow for no reason.

andydotxyz commented 1 year ago

I'm not sure what "stop until buffer copied" issue you mean. I just reread the conversation above and I didn't see anything that seemed to match.

Sorry I was referring to the same concept that you better described as "callers cannot wrap a new paragraph until they are done with the previous". I think I misunderstood what the default behaviour is so it should be fine.