Closed whereswaldon closed 1 year ago
Great news !
I'm very happy to see the two optimisations via buffers !
I will look into the details later, but I have a preliminary suggestion : can we avoid to depend on github.com/benoitkugler/textlayout-testdata
? The fonts were split in this module to avoid normal user having to download it (since it's huge).
If I'm not mistaken, only copying one font used in wrapping_test.go
should be enough.
(I can send a commit if you want.)
I will look into the details later, but I have a preliminary suggestion : can we avoid to depend on
github.com/benoitkugler/textlayout-testdata
? The fonts were split in this module to avoid normal user having to download it (since it's huge). If I'm not mistaken, only copying one font used inwrapping_test.go
should be enough.
Done! Very good point. I forgot that we already had an arabic font in the repo.
Is there a particular reason to expose the BreakState
type ?
If not, I think it would be cleaner to make it a private field of LineWrapper
(updated after each call to LineWrap
).
(It would also be probably easier to reuse its resources across several paragraphs.)
Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be mapRunesToClusterIndices
. I've always thought the Harfbuzz shaping would be the most expensive operation.
This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?
Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be
mapRunesToClusterIndices
. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?
Well, the short answer is that I wrote the algorithm a year ago when I understood the domain of text shaping less well, and I appear to have made some suboptimal decisions. It resolves the index per rune, which redoes work when adjacent runes are part of the same cluster. I think I can rewrite it to be more efficient. Thanks for raising the point. I had also seen it jump out in benchmarks, but instead focused on doing it less often rather than its implementation. I'll try to do that soon.
Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be
mapRunesToClusterIndices
. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?Well, the short answer is that I wrote the algorithm a year ago when I understood the domain of text shaping less well, and I appear to have made some suboptimal decisions. It resolves the index per rune, which redoes work when adjacent runes are part of the same cluster. I think I can rewrite it to be more efficient. Thanks for raising the point. I had also seen it jump out in benchmarks, but instead focused on doing it less often rather than its implementation. I'll try to do that soon.
Alright, thanks for the explanation. I'm looking at it on my own, this is a good "exercice" to understand clusters... Maybe you could wait some days to see if I reach a decent enough rewrite ?
Side note : I've profiled a simplified version of BenchmarkWrappingArabic, and I'm puzzled to see that the bottleneck seems to be
mapRunesToClusterIndices
. I've always thought the Harfbuzz shaping would be the most expensive operation. This is definitively not a major concern for now, but have you an idea of why is this operation so heavy ?Well, the short answer is that I wrote the algorithm a year ago when I understood the domain of text shaping less well, and I appear to have made some suboptimal decisions. It resolves the index per rune, which redoes work when adjacent runes are part of the same cluster. I think I can rewrite it to be more efficient. Thanks for raising the point. I had also seen it jump out in benchmarks, but instead focused on doing it less often rather than its implementation. I'll try to do that soon.
Alright, thanks for the explanation. I'm looking at it on my own, this is a good "exercice" to understand clusters... Maybe you could wait some days to see if I reach a decent enough rewrite ?
Well... I rewrote it to be as efficient as I think is reasonably possible (linear performance with the number of glyphs), and it's only a tiny bit better. I'll push that version soon. Got pulled away from my computer by an angry baby before I could finalize it.
I have another approach that does the mapping as-needed on a rune-by-rune basis. I previously abandoned it because it seemed doomed to be more expensive, but I've had some ideas today that might make it viable.
I think I just realized what is going wrong. It's not that the mapping is crazy expensive, it's that the current approach can only reuse the generated mapping within a line, so we're regenerating it for each line of wrapped text. That should be fixable. I'll see how that goes later today if I can.
Well... I rewrote it to be as efficient as I think is reasonably possible (linear performance with the number of glyphs), and it's only a tiny bit better. I'll push that version soon. Got pulled away from my computer by an angry baby before I could finalize it.
Ah ! Same here, very disappointing performance boost even with a code which seems really straightforward
I think I just realized what is going wrong. It's not that the mapping is crazy expensive, it's that the current approach can only reuse the generated mapping within a line, so we're regenerating it for each line of wrapped text. That should be fixable. I'll see how that goes later today if I can.
Great, thank you ! No hurry of course...
@benoitkugler I've pushed updates for several things:
Overall, I really want to thank you for the feedback you've given. It has helped improve the code tremendously.
- I've added my updated mapping function along with a benchmark. Would you plug the one you wrote in and see if it's even better? I'd like to use the best implementation we have, provided that it passes the tests.
Done ! It seems to perform slightly better, and is a bit simpler as well : it reuses the (very precious) information in Glyph.RunesCount
and Glyph.GlyphsCount
- I've added my updated mapping function along with a benchmark. Would you plug the one you wrote in and see if it's even better? I'd like to use the best implementation we have, provided that it passes the tests.
Done ! It seems to perform slightly better, and is a bit simpler as well : it reuses the (very precious) information in
Glyph.RunesCount
andGlyph.GlyphsCount
Excellent work! I wish I'd thought to do that. I've also pushed a single function that can perform the required mapping on-the-fly using a binary search strategy. When I tried plugging it in to the line wrapper (and dropping the step that generates the mappings), things got slightly slower, but I wonder whether there isn't a combination of caching and optimizing that might make that approach viable. If we could do the mapping on-the-fly, we could stop allocating the mapping slice entirely.
My mapping function doesn't currently take advantage of the Glyph.GlyphCount
and Glyph.RuneCount
fields, so it's possible that making it smarter with those would make it cheap enough to be viable. I'll experiment later.
I've proposed some minor changes in #26.
I just restructured the benchmark code to be easier to extend, and I also added a parameter to the wrapping benchmark that controls how many individual shaping.Output
s the text is split into prior to wrapping. As I suspected, that has a performance impact. We can likely optimize for those cases more. Anyway, here's what I get on a 10 second benchmark:
goarch: amd64
pkg: github.com/go-text/typesetting/shaping
cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkShaping/10runes-arabic-4 52394 234536 ns/op 83048 B/op 1363 allocs/op
BenchmarkShaping/100runes-arabic-4 17991 666149 ns/op 152105 B/op 3339 allocs/op
BenchmarkShaping/1000runes-arabic-4 2407 4831744 ns/op 890538 B/op 24568 allocs/op
BenchmarkShaping/10runes-latin-4 2661010 4582 ns/op 2736 B/op 9 allocs/op
BenchmarkShaping/100runes-latin-4 439203 26578 ns/op 8624 B/op 9 allocs/op
BenchmarkShaping/1000runes-latin-4 49524 239683 ns/op 67636 B/op 9 allocs/op
BenchmarkMappingRunes/10runes-arabic-v1-4 151451402 79.19 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/10runes-arabic-v2-4 179652518 66.86 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/10runes-arabic-v3-4 265295043 45.42 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/100runes-arabic-v1-4 15264229 784.0 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/100runes-arabic-v2-4 19197888 624.8 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/100runes-arabic-v3-4 28057161 427.6 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/1000runes-arabic-v1-4 1575374 7609 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/1000runes-arabic-v2-4 1973664 6081 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/1000runes-arabic-v3-4 2161999 5545 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/10runes-latin-v1-4 135112519 89.05 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/10runes-latin-v2-4 168851094 70.80 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/10runes-latin-v3-4 282348432 42.32 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/100runes-latin-v2-4 18252578 656.9 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/100runes-latin-v3-4 25539951 510.0 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/100runes-latin-v1-4 16177051 745.9 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/1000runes-latin-v1-4 1647464 7210 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/1000runes-latin-v2-4 1873270 6410 ns/op 0 B/op 0 allocs/op
BenchmarkMappingRunes/1000runes-latin-v3-4 2040956 5959 ns/op 0 B/op 0 allocs/op
BenchmarkWrapping/10runes-arabic-1parts-4 262567941 45.93 ns/op 24 B/op 1 allocs/op
BenchmarkWrapping/10runes-arabic-10parts-4 2424694 4948 ns/op 3144 B/op 9 allocs/op
BenchmarkWrapping/100runes-arabic-1parts-4 446210 25572 ns/op 1272 B/op 15 allocs/op
BenchmarkWrapping/100runes-arabic-10parts-4 497608 23853 ns/op 3240 B/op 10 allocs/op
BenchmarkWrapping/100runes-arabic-100parts-4 343171 35476 ns/op 24648 B/op 12 allocs/op
BenchmarkWrapping/1000runes-arabic-1parts-4 48951 245608 ns/op 13656 B/op 88 allocs/op
BenchmarkWrapping/1000runes-arabic-10parts-4 56056 214261 ns/op 4248 B/op 20 allocs/op
BenchmarkWrapping/1000runes-arabic-100parts-4 55044 227339 ns/op 24744 B/op 13 allocs/op
BenchmarkWrapping/1000runes-arabic-1000parts-4 33136 364069 ns/op 303176 B/op 16 allocs/op
BenchmarkWrapping/10runes-latin-1parts-4 262214733 46.27 ns/op 24 B/op 1 allocs/op
BenchmarkWrapping/10runes-latin-10parts-4 2689950 4688 ns/op 3144 B/op 9 allocs/op
BenchmarkWrapping/100runes-latin-1parts-4 556855 21144 ns/op 1848 B/op 18 allocs/op
BenchmarkWrapping/100runes-latin-10parts-4 566442 19102 ns/op 3240 B/op 10 allocs/op
BenchmarkWrapping/100runes-latin-100parts-4 377547 31246 ns/op 24648 B/op 12 allocs/op
BenchmarkWrapping/1000runes-latin-1parts-4 58994 204511 ns/op 16824 B/op 121 allocs/op
BenchmarkWrapping/1000runes-latin-10parts-4 69567 172653 ns/op 4824 B/op 23 allocs/op
BenchmarkWrapping/1000runes-latin-100parts-4 67520 183835 ns/op 24744 B/op 13 allocs/op
BenchmarkWrapping/1000runes-latin-1000parts-4 36948 327560 ns/op 303176 B/op 16 allocs/op
BenchmarkWrappingHappyPath-4 93334657 126.0 ns/op 120 B/op 2 allocs/op
I noticed that the benchmarks for wrapping multiple parts were operating on incorrect data. The loop I used to cut the output into parts had a bug which always yeilded the first part. Cutting 100 runes worth of output into 10 parts would give you ten copies of the first 10 runes (shaped). I've fixed that and added a test for that operation. The correct performance numbers are here:
goos: linux
goarch: amd64
pkg: github.com/go-text/typesetting/shaping
cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkShaping/10runes-arabic-4 54723 220785 ns/op 83048 B/op 1363 allocs/op
BenchmarkShaping/100runes-arabic-4 18742 641292 ns/op 152105 B/op 3339 allocs/op
BenchmarkShaping/1000runes-arabic-4 2517 4679373 ns/op 890534 B/op 24568 allocs/op
BenchmarkShaping/10runes-latin-4 2736532 4379 ns/op 2736 B/op 9 allocs/op
BenchmarkShaping/100runes-latin-4 465279 25539 ns/op 8624 B/op 9 allocs/op
BenchmarkShaping/1000runes-latin-4 51714 232495 ns/op 67635 B/op 9 allocs/op
BenchmarkMapping/10runes-arabic-v3-4 266173123 45.29 ns/op 0 B/op 0 allocs/op
BenchmarkMapping/100runes-arabic-v3-4 28156170 426.4 ns/op 0 B/op 0 allocs/op
BenchmarkMapping/1000runes-arabic-v3-4 2169879 5529 ns/op 0 B/op 0 allocs/op
BenchmarkMapping/10runes-latin-v3-4 288329582 41.61 ns/op 0 B/op 0 allocs/op
BenchmarkMapping/100runes-latin-v3-4 25580140 467.8 ns/op 0 B/op 0 allocs/op
BenchmarkMapping/1000runes-latin-v3-4 2034584 5865 ns/op 0 B/op 0 allocs/op
BenchmarkWrapping/10runes-arabic-1parts-4 269953807 44.31 ns/op 24 B/op 1 allocs/op
BenchmarkWrapping/10runes-arabic-10parts-4 1918556 6254 ns/op 5128 B/op 12 allocs/op
BenchmarkWrapping/100runes-arabic-1parts-4 476154 25642 ns/op 1272 B/op 15 allocs/op
BenchmarkWrapping/100runes-arabic-10parts-4 422366 28449 ns/op 4440 B/op 33 allocs/op
BenchmarkWrapping/100runes-arabic-100parts-4 208006 57385 ns/op 52536 B/op 73 allocs/op
BenchmarkWrapping/1000runes-arabic-1parts-4 49730 241569 ns/op 13656 B/op 88 allocs/op
BenchmarkWrapping/1000runes-arabic-10parts-4 48684 246473 ns/op 16056 B/op 105 allocs/op
BenchmarkWrapping/1000runes-arabic-100parts-4 42943 278939 ns/op 47544 B/op 277 allocs/op
BenchmarkWrapping/1000runes-arabic-1000parts-4 20566 581722 ns/op 568568 B/op 633 allocs/op
BenchmarkWrapping/10runes-latin-1parts-4 265786851 45.81 ns/op 24 B/op 1 allocs/op
BenchmarkWrapping/10runes-latin-10parts-4 2179726 5715 ns/op 4744 B/op 11 allocs/op
BenchmarkWrapping/100runes-latin-1parts-4 578455 20718 ns/op 1848 B/op 18 allocs/op
BenchmarkWrapping/100runes-latin-10parts-4 496795 24223 ns/op 4824 B/op 34 allocs/op
BenchmarkWrapping/100runes-latin-100parts-4 221144 54522 ns/op 55768 B/op 77 allocs/op
BenchmarkWrapping/1000runes-latin-1parts-4 60574 199624 ns/op 16824 B/op 121 allocs/op
BenchmarkWrapping/1000runes-latin-10parts-4 57458 208227 ns/op 19224 B/op 138 allocs/op
BenchmarkWrapping/1000runes-latin-100parts-4 50022 242387 ns/op 48792 B/op 305 allocs/op
BenchmarkWrapping/1000runes-latin-1000parts-4 22030 544804 ns/op 562040 B/op 743 allocs/op
BenchmarkWrappingHappyPath-4 90137808 124.5 ns/op 120 B/op 2 allocs/op
As you can see, the overhead of cutting the text into parts becomes extreme as we approach the case in which every single rune has been cut into its own part. I think this can be made more efficient, but is probably inevitable.
Pushed a quick update to one of the tests to catch a rune accounting bug, as well as a fix for said bug. I couldn't easily make it a separate PR because the test case I changed was introduced in this PR.
@benoitkugler @andydotxyz I've removed the error return from the shaper signature. Doing so required implementing vertical text support, as we previously returned an error when you tried to shape or wrap an unimplemented text direction. It's a slightly larger change than I expected, but the changes are mostly cosmetic. Our concept of text "progression" being independent of axis means that none of the wrapping code needed to change, just the code to calculate the bounds of a line.
And a thunderclap echoed across GitHub, as nearly 12 thousand lines of change entered the main branch, enabling gophers everywhere to shape and wrap complex scripts to their hearts' content. Thanks for the great work everyone!
Really awesome !
Closes #19
This PR is an updated version of #19 that can wrap multiple runs of shaped text (
[]Output
) into lines of (optionally varying) width.I've folded @benoitkugler 's excellent unicode segmenter into here, as it is a vast improvement over what I used in the previous PR.
I have also made some API changes. Notably:
shaping.Shaper
interface to indicate that it can return an error, since the implementation can return an errorshaping.HarfbuzzShaper
). This type holds a text shaping buffer as an unexported field, and reuses it across invocations ofShape
. This is a pretty significant CPU/memory savings over creating a new buffer on every shaping operation.[]shaping.Line
, where aLine
is equivalent to[]Output
shaping.LineWrapper
type. This type holds a buffer which can be reused across wrapping operations for additional resource savings, similar to theshaping.HarfbuzzShaper
.LineWrapper.WrapParagraph
, which will perform a fixed-width wrapping of[]Output
, or you can iteratively invokeLineWrapper.WrapNextLine
with varying widths. This enables both ending the line-wrapping process early if you didn't need the whole paragraph, and it also allows insetting content like images within a paragraph.Face
field toshaping.Output
. Now that it's possible to shape many runs of text and then wrap them into a paragraph, it's important to track which font face should be used for which run of text.I've added numerous tests and benchmarks along the way, and I think they should prove useful moving forward.
As for how the line wrapping works, the key things to understand are:
mapRunesToClusterIndices
: this function creates a mapping between runes of input text and glyphs clusters of output font glyphs for a givenshaping.Output
. This mapping is useful because it enables the next operation.inclusiveGlyphRange
: this function uses a mapping provided by the previous function and an inclusive range of runes to produce an inclusive range of glyphs within ashaping.Output
containing those runes. This function works correctly regardless of the RTL/LTR state of the text, effectively normalizing operations across the two possible text directions. We use this operation to implement the next one.cutRun
: this function cuts ashaping.Output
to only contain glyph data for an inclusive range of runes. Because it works using the above, it behaves correctly for RTL/LTR data, and allows the programmer to think purely in terms of which runes are covered by ashaping.Output
.With the above operations available, wrapping lines is implemented as a greedy algorithm:
[]Outputs
looking for the one containing the current legal break locationOutput
containing the legal break location to only contain runes before the legal break[]Output
traversed to see if they fit in the available spaceThe algorithm doesn't need to worry about RTL/LTR because that property is hidden with the operation of cutting an
Output
.We should be able to easily make other line wrapping algorithms that take advantage of the same
cutRun
primitive, but use different strategies than the basic greedy approach.Similarly, we may be able to find more efficient implementations of the
cutRun
operation, which would then accelerate all line wrappers.I'm open to renaming basically any symbol (I don't feel strongly about them). However, I would really like to actually land one of these PRs soon. Line wrapping has been available in a PR for almost a full year now. This PR's changeset is huge precisely because of the sheer amount of accumulated changes during that interval. Please carefully weigh what is worth blocking the PR for against what can easily be addressed by a small change later.