Line wrapping multi - Githubissues

whereswaldon commented 1 year ago

Closes #19

This PR is an updated version of #19 that can wrap multiple runs of shaped text ([]Output) into lines of (optionally varying) width.

I've folded @benoitkugler 's excellent unicode segmenter into here, as it is a vast improvement over what I used in the previous PR.

I have also made some API changes. Notably:

I've updated the shaping.Shaper interface to indicate that it can return an error, since the implementation can return an error
I've created a concrete type that performs shaping (shaping.HarfbuzzShaper). This type holds a text shaping buffer as an unexported field, and reuses it across invocations of Shape. This is a pretty significant CPU/memory savings over creating a new buffer on every shaping operation.
I've reworked the line wrapping API in several ways:
- It now returns []shaping.Line, where a Line is equivalent to []Output
- It is now performed by the concrete shaping.LineWrapper type. This type holds a buffer which can be reused across wrapping operations for additional resource savings, similar to the shaping.HarfbuzzShaper.
- You now have two options for wrapping a paragraph of text. You can wrap with LineWrapper.WrapParagraph, which will perform a fixed-width wrapping of []Output, or you can iteratively invoke LineWrapper.WrapNextLine with varying widths. This enables both ending the line-wrapping process early if you didn't need the whole paragraph, and it also allows insetting content like images within a paragraph.
I've added a Face field to shaping.Output. Now that it's possible to shape many runs of text and then wrap them into a paragraph, it's important to track which font face should be used for which run of text.

I've added numerous tests and benchmarks along the way, and I think they should prove useful moving forward.

As for how the line wrapping works, the key things to understand are:

mapRunesToClusterIndices: this function creates a mapping between runes of input text and glyphs clusters of output font glyphs for a given shaping.Output. This mapping is useful because it enables the next operation.
inclusiveGlyphRange: this function uses a mapping provided by the previous function and an inclusive range of runes to produce an inclusive range of glyphs within a shaping.Output containing those runes. This function works correctly regardless of the RTL/LTR state of the text, effectively normalizing operations across the two possible text directions. We use this operation to implement the next one.
cutRun: this function cuts a shaping.Output to only contain glyph data for an inclusive range of runes. Because it works using the above, it behaves correctly for RTL/LTR data, and allows the programmer to think purely in terms of which runes are covered by a shaping.Output.

With the above operations available, wrapping lines is implemented as a greedy algorithm:

initialize a breaker to tell us where we can legally break lines (in terms of rune offsets)
iterate the []Outputs looking for the one containing the current legal break location
cut the Output containing the legal break location to only contain runes before the legal break
compare the length of all []Output traversed to see if they fit in the available space
- if so, commit this as the best known wrapping and try again with the next break candidate
- if not:
  - if there are no known wrappings that fit, use this anyway
  - otherwise, use the best known wrapping for this line

The algorithm doesn't need to worry about RTL/LTR because that property is hidden with the operation of cutting an Output.

We should be able to easily make other line wrapping algorithms that take advantage of the same cutRun primitive, but use different strategies than the basic greedy approach.

Similarly, we may be able to find more efficient implementations of the cutRun operation, which would then accelerate all line wrappers.

I'm open to renaming basically any symbol (I don't feel strongly about them). However, I would really like to actually land one of these PRs soon. Line wrapping has been available in a PR for almost a full year now. This PR's changeset is huge precisely because of the sheer amount of accumulated changes during that interval. Please carefully weigh what is worth blocking the PR for against what can easily be addressed by a small change later.

benoitkugler commented 1 year ago

Great news !

I'm very happy to see the two optimisations via buffers !

I will look into the details later, but I have a preliminary suggestion : can we avoid to depend on github.com/benoitkugler/textlayout-testdata ? The fonts were split in this module to avoid normal user having to download it (since it's huge). If I'm not mistaken, only copying one font used in wrapping_test.go should be enough.

(I can send a commit if you want.)

whereswaldon commented 1 year ago

I will look into the details later, but I have a preliminary suggestion : can we avoid to depend on github.com/benoitkugler/textlayout-testdata ? The fonts were split in this module to avoid normal user having to download it (since it's huge). If I'm not mistaken, only copying one font used in wrapping_test.go should be enough.

Done! Very good point. I forgot that we already had an arabic font in the repo.

benoitkugler commented 1 year ago

Is there a particular reason to expose the BreakState type ? If not, I think it would be cleaner to make it a private field of LineWrapper (updated after each call to LineWrap). (It would also be probably easier to reuse its resources across several paragraphs.)

benoitkugler commented 1 year ago

Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be mapRunesToClusterIndices. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?

whereswaldon commented 1 year ago

Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be mapRunesToClusterIndices. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?

Well, the short answer is that I wrote the algorithm a year ago when I understood the domain of text shaping less well, and I appear to have made some suboptimal decisions. It resolves the index per rune, which redoes work when adjacent runes are part of the same cluster. I think I can rewrite it to be more efficient. Thanks for raising the point. I had also seen it jump out in benchmarks, but instead focused on doing it less often rather than its implementation. I'll try to do that soon.

benoitkugler commented 1 year ago

Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be mapRunesToClusterIndices. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?

Well, the short answer is that I wrote the algorithm a year ago when I understood the domain of text shaping less well, and I appear to have made some suboptimal decisions. It resolves the index per rune, which redoes work when adjacent runes are part of the same cluster. I think I can rewrite it to be more efficient. Thanks for raising the point. I had also seen it jump out in benchmarks, but instead focused on doing it less often rather than its implementation. I'll try to do that soon.

Alright, thanks for the explanation. I'm looking at it on my own, this is a good "exercice" to understand clusters... Maybe you could wait some days to see if I reach a decent enough rewrite ?

whereswaldon commented 1 year ago

Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be mapRunesToClusterIndices. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?

Well, the short answer is that I wrote the algorithm a year ago when I understood the domain of text shaping less well, and I appear to have made some suboptimal decisions. It resolves the index per rune, which redoes work when adjacent runes are part of the same cluster. I think I can rewrite it to be more efficient. Thanks for raising the point. I had also seen it jump out in benchmarks, but instead focused on doing it less often rather than its implementation. I'll try to do that soon.

Alright, thanks for the explanation. I'm looking at it on my own, this is a good "exercice" to understand clusters... Maybe you could wait some days to see if I reach a decent enough rewrite ?

Well... I rewrote it to be as efficient as I think is reasonably possible (linear performance with the number of glyphs), and it's only a tiny bit better. I'll push that version soon. Got pulled away from my computer by an angry baby before I could finalize it.

I have another approach that does the mapping as-needed on a rune-by-rune basis. I previously abandoned it because it seemed doomed to be more expensive, but I've had some ideas today that might make it viable.

whereswaldon commented 1 year ago

I think I just realized what is going wrong. It's not that the mapping is crazy expensive, it's that the current approach can only reuse the generated mapping within a line, so we're regenerating it for each line of wrapped text. That should be fixable. I'll see how that goes later today if I can.

benoitkugler commented 1 year ago

Well... I rewrote it to be as efficient as I think is reasonably possible (linear performance with the number of glyphs), and it's only a tiny bit better. I'll push that version soon. Got pulled away from my computer by an angry baby before I could finalize it.

Ah ! Same here, very disappointing performance boost even with a code which seems really straightforward

I think I just realized what is going wrong. It's not that the mapping is crazy expensive, it's that the current approach can only reuse the generated mapping within a line, so we're regenerating it for each line of wrapped text. That should be fixable. I'll see how that goes later today if I can.

Great, thank you ! No hurry of course...

whereswaldon commented 1 year ago

@benoitkugler I've pushed updates for several things:

I've added my updated mapping function along with a benchmark. Would you plug the one you wrote in and see if it's even better? I'd like to use the best implementation we have, provided that it passes the tests.
I've updated the API to eliminate BreakState. The result is better and cleaner. Thank you for suggesting it!
Changing the API enabled me to reuse rune mappings across lines, which made the performance issue for Arabic wrapping disappear. Now the bottleneck is in the segmenter.

Overall, I really want to thank you for the feedback you've given. It has helped improve the code tremendously.

benoitkugler commented 1 year ago

I've added my updated mapping function along with a benchmark. Would you plug the one you wrote in and see if it's even better? I'd like to use the best implementation we have, provided that it passes the tests.

Done ! It seems to perform slightly better, and is a bit simpler as well : it reuses the (very precious) information in Glyph.RunesCount and Glyph.GlyphsCount

whereswaldon commented 1 year ago

I've added my updated mapping function along with a benchmark. Would you plug the one you wrote in and see if it's even better? I'd like to use the best implementation we have, provided that it passes the tests.

Done ! It seems to perform slightly better, and is a bit simpler as well : it reuses the (very precious) information in Glyph.RunesCount and Glyph.GlyphsCount

Excellent work! I wish I'd thought to do that. I've also pushed a single function that can perform the required mapping on-the-fly using a binary search strategy. When I tried plugging it in to the line wrapper (and dropping the step that generates the mappings), things got slightly slower, but I wonder whether there isn't a combination of caching and optimizing that might make that approach viable. If we could do the mapping on-the-fly, we could stop allocating the mapping slice entirely.

My mapping function doesn't currently take advantage of the Glyph.GlyphCount and Glyph.RuneCount fields, so it's possible that making it smarter with those would make it cheap enough to be viable. I'll experiment later.

benoitkugler commented 1 year ago

I've proposed some minor changes in #26.

whereswaldon commented 1 year ago

I just restructured the benchmark code to be easier to extend, and I also added a parameter to the wrapping benchmark that controls how many individual shaping.Outputs the text is split into prior to wrapping. As I suspected, that has a performance impact. We can likely optimize for those cases more. Anyway, here's what I get on a 10 second benchmark:

goarch: amd64
pkg: github.com/go-text/typesetting/shaping
cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkShaping/10runes-arabic-4                  52394            234536 ns/op           83048 B/op       1363 allocs/op
BenchmarkShaping/100runes-arabic-4                 17991            666149 ns/op          152105 B/op       3339 allocs/op
BenchmarkShaping/1000runes-arabic-4                 2407           4831744 ns/op          890538 B/op      24568 allocs/op
BenchmarkShaping/10runes-latin-4                 2661010              4582 ns/op            2736 B/op          9 allocs/op
BenchmarkShaping/100runes-latin-4                 439203             26578 ns/op            8624 B/op          9 allocs/op
BenchmarkShaping/1000runes-latin-4                 49524            239683 ns/op           67636 B/op          9 allocs/op
BenchmarkMappingRunes/10runes-arabic-v1-4       151451402               79.19 ns/op            0 B/op          0 allocs/op
BenchmarkMappingRunes/10runes-arabic-v2-4       179652518               66.86 ns/op            0 B/op          0 allocs/op
BenchmarkMappingRunes/10runes-arabic-v3-4       265295043               45.42 ns/op            0 B/op          0 allocs/op
BenchmarkMappingRunes/100runes-arabic-v1-4              15264229               784.0 ns/op             0 B/op          0 allocs/op
BenchmarkMappingRunes/100runes-arabic-v2-4              19197888               624.8 ns/op             0 B/op          0 allocs/op
BenchmarkMappingRunes/100runes-arabic-v3-4              28057161               427.6 ns/op             0 B/op          0 allocs/op
BenchmarkMappingRunes/1000runes-arabic-v1-4              1575374              7609 ns/op               0 B/op          0 allocs/op
BenchmarkMappingRunes/1000runes-arabic-v2-4              1973664              6081 ns/op               0 B/op          0 allocs/op
BenchmarkMappingRunes/1000runes-arabic-v3-4              2161999              5545 ns/op               0 B/op          0 allocs/op
BenchmarkMappingRunes/10runes-latin-v1-4                135112519               89.05 ns/op            0 B/op          0 allocs/op
BenchmarkMappingRunes/10runes-latin-v2-4                168851094               70.80 ns/op            0 B/op          0 allocs/op
BenchmarkMappingRunes/10runes-latin-v3-4                282348432               42.32 ns/op            0 B/op          0 allocs/op
BenchmarkMappingRunes/100runes-latin-v2-4               18252578               656.9 ns/op             0 B/op          0 allocs/op
BenchmarkMappingRunes/100runes-latin-v3-4               25539951               510.0 ns/op             0 B/op          0 allocs/op
BenchmarkMappingRunes/100runes-latin-v1-4               16177051               745.9 ns/op             0 B/op          0 allocs/op
BenchmarkMappingRunes/1000runes-latin-v1-4               1647464              7210 ns/op               0 B/op          0 allocs/op
BenchmarkMappingRunes/1000runes-latin-v2-4               1873270              6410 ns/op               0 B/op          0 allocs/op
BenchmarkMappingRunes/1000runes-latin-v3-4               2040956              5959 ns/op               0 B/op          0 allocs/op
BenchmarkWrapping/10runes-arabic-1parts-4               262567941               45.93 ns/op           24 B/op          1 allocs/op
BenchmarkWrapping/10runes-arabic-10parts-4               2424694              4948 ns/op            3144 B/op          9 allocs/op
BenchmarkWrapping/100runes-arabic-1parts-4                446210             25572 ns/op            1272 B/op         15 allocs/op
BenchmarkWrapping/100runes-arabic-10parts-4               497608             23853 ns/op            3240 B/op         10 allocs/op
BenchmarkWrapping/100runes-arabic-100parts-4              343171             35476 ns/op           24648 B/op         12 allocs/op
BenchmarkWrapping/1000runes-arabic-1parts-4                48951            245608 ns/op           13656 B/op         88 allocs/op
BenchmarkWrapping/1000runes-arabic-10parts-4               56056            214261 ns/op            4248 B/op         20 allocs/op
BenchmarkWrapping/1000runes-arabic-100parts-4              55044            227339 ns/op           24744 B/op         13 allocs/op
BenchmarkWrapping/1000runes-arabic-1000parts-4             33136            364069 ns/op          303176 B/op         16 allocs/op
BenchmarkWrapping/10runes-latin-1parts-4                262214733               46.27 ns/op           24 B/op          1 allocs/op
BenchmarkWrapping/10runes-latin-10parts-4                2689950              4688 ns/op            3144 B/op          9 allocs/op
BenchmarkWrapping/100runes-latin-1parts-4                 556855             21144 ns/op            1848 B/op         18 allocs/op
BenchmarkWrapping/100runes-latin-10parts-4                566442             19102 ns/op            3240 B/op         10 allocs/op
BenchmarkWrapping/100runes-latin-100parts-4               377547             31246 ns/op           24648 B/op         12 allocs/op
BenchmarkWrapping/1000runes-latin-1parts-4                 58994            204511 ns/op           16824 B/op        121 allocs/op
BenchmarkWrapping/1000runes-latin-10parts-4                69567            172653 ns/op            4824 B/op         23 allocs/op
BenchmarkWrapping/1000runes-latin-100parts-4               67520            183835 ns/op           24744 B/op         13 allocs/op
BenchmarkWrapping/1000runes-latin-1000parts-4              36948            327560 ns/op          303176 B/op         16 allocs/op
BenchmarkWrappingHappyPath-4                            93334657               126.0 ns/op           120 B/op          2 allocs/op

whereswaldon commented 1 year ago

I noticed that the benchmarks for wrapping multiple parts were operating on incorrect data. The loop I used to cut the output into parts had a bug which always yeilded the first part. Cutting 100 runes worth of output into 10 parts would give you ten copies of the first 10 runes (shaped). I've fixed that and added a test for that operation. The correct performance numbers are here:

goos: linux
goarch: amd64
pkg: github.com/go-text/typesetting/shaping
cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkShaping/10runes-arabic-4                          54723            220785 ns/op           83048 B/op       1363 allocs/op
BenchmarkShaping/100runes-arabic-4                         18742            641292 ns/op          152105 B/op       3339 allocs/op
BenchmarkShaping/1000runes-arabic-4                         2517           4679373 ns/op          890534 B/op      24568 allocs/op
BenchmarkShaping/10runes-latin-4                         2736532              4379 ns/op            2736 B/op          9 allocs/op
BenchmarkShaping/100runes-latin-4                         465279             25539 ns/op            8624 B/op          9 allocs/op
BenchmarkShaping/1000runes-latin-4                         51714            232495 ns/op           67635 B/op          9 allocs/op
BenchmarkMapping/10runes-arabic-v3-4                   266173123                45.29 ns/op            0 B/op          0 allocs/op
BenchmarkMapping/100runes-arabic-v3-4                   28156170               426.4 ns/op             0 B/op          0 allocs/op
BenchmarkMapping/1000runes-arabic-v3-4                   2169879              5529 ns/op               0 B/op          0 allocs/op
BenchmarkMapping/10runes-latin-v3-4                    288329582                41.61 ns/op            0 B/op          0 allocs/op
BenchmarkMapping/100runes-latin-v3-4                    25580140               467.8 ns/op             0 B/op          0 allocs/op
BenchmarkMapping/1000runes-latin-v3-4                    2034584              5865 ns/op               0 B/op          0 allocs/op
BenchmarkWrapping/10runes-arabic-1parts-4              269953807                44.31 ns/op           24 B/op          1 allocs/op
BenchmarkWrapping/10runes-arabic-10parts-4               1918556              6254 ns/op            5128 B/op         12 allocs/op
BenchmarkWrapping/100runes-arabic-1parts-4                476154             25642 ns/op            1272 B/op         15 allocs/op
BenchmarkWrapping/100runes-arabic-10parts-4               422366             28449 ns/op            4440 B/op         33 allocs/op
BenchmarkWrapping/100runes-arabic-100parts-4              208006             57385 ns/op           52536 B/op         73 allocs/op
BenchmarkWrapping/1000runes-arabic-1parts-4                49730            241569 ns/op           13656 B/op         88 allocs/op
BenchmarkWrapping/1000runes-arabic-10parts-4               48684            246473 ns/op           16056 B/op        105 allocs/op
BenchmarkWrapping/1000runes-arabic-100parts-4              42943            278939 ns/op           47544 B/op        277 allocs/op
BenchmarkWrapping/1000runes-arabic-1000parts-4             20566            581722 ns/op          568568 B/op        633 allocs/op
BenchmarkWrapping/10runes-latin-1parts-4               265786851                45.81 ns/op           24 B/op          1 allocs/op
BenchmarkWrapping/10runes-latin-10parts-4                2179726              5715 ns/op            4744 B/op         11 allocs/op
BenchmarkWrapping/100runes-latin-1parts-4                 578455             20718 ns/op            1848 B/op         18 allocs/op
BenchmarkWrapping/100runes-latin-10parts-4                496795             24223 ns/op            4824 B/op         34 allocs/op
BenchmarkWrapping/100runes-latin-100parts-4               221144             54522 ns/op           55768 B/op         77 allocs/op
BenchmarkWrapping/1000runes-latin-1parts-4                 60574            199624 ns/op           16824 B/op        121 allocs/op
BenchmarkWrapping/1000runes-latin-10parts-4                57458            208227 ns/op           19224 B/op        138 allocs/op
BenchmarkWrapping/1000runes-latin-100parts-4               50022            242387 ns/op           48792 B/op        305 allocs/op
BenchmarkWrapping/1000runes-latin-1000parts-4              22030            544804 ns/op          562040 B/op        743 allocs/op
BenchmarkWrappingHappyPath-4                            90137808               124.5 ns/op           120 B/op          2 allocs/op

As you can see, the overhead of cutting the text into parts becomes extreme as we approach the case in which every single rune has been cut into its own part. I think this can be made more efficient, but is probably inevitable.

whereswaldon commented 1 year ago

Pushed a quick update to one of the tests to catch a rune accounting bug, as well as a fix for said bug. I couldn't easily make it a separate PR because the test case I changed was introduced in this PR.

whereswaldon commented 1 year ago

@benoitkugler @andydotxyz I've removed the error return from the shaper signature. Doing so required implementing vertical text support, as we previously returned an error when you tried to shape or wrap an unimplemented text direction. It's a slightly larger change than I expected, but the changes are mostly cosmetic. Our concept of text "progression" being independent of axis means that none of the wrapping code needed to change, just the code to calculate the bounds of a line.

whereswaldon commented 1 year ago

And a thunderclap echoed across GitHub, as nearly 12 thousand lines of change entered the main branch, enabling gophers everywhere to shape and wrap complex scripts to their hearts' content. Thanks for the great work everyone!

benoitkugler commented 1 year ago

Really awesome !

go-text / typesetting

Line wrapping multi #25