RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
498 stars 34 forks source link

Backport to 7.0.0 #112

Closed LaurenzV closed 1 week ago

LaurenzV commented 2 weeks ago
Status Commit message HB
🟡 [hb-view/hb-shape] Add --glyphs Link We don't need this API, right? We don't have anything similar to hb-view, so it probably doesn't make much sense
🟡 [util] Improve --glyphs Link
🟡 [util] Fix vertical positioning with --glyphs Link
🟢 [paint-extents] Start out Link
🟢 [paint-extents] Flesh out some more Link
🟢 [paint-extents] Flesh out more Link
🟢 [paint-extents] More Link
🟢 [paint-extend] More Link
🟢 [paint-extents] Finish off Link
🟢 Get outline extents manually Link
🟢 Skip empty outlines Link
🟢 [paint-extents] Better handle empty glyphs Link
🟢 [colr] Return true extents Link
🟢 [paint-extents] Clean up Link
🟢 [paint-extents] Streamline extents_t Link
🟢 [paint-extents] Streamline extents_t more Link
🟢 [paint-extents] Refactor code Link
🟢 [paint-extents] Minor refactor Link
🟢 [paint-extents] Comments Link
🟢 [paint-extents] Rename variable Link
🟡 [gsubgpos] Minor use ->clear() directly Link relevant? wasnt able to fint he corresponding code
🟡 [gsubgpos] Use swap instead of move Link
🟢 [paint-extents] Simplify transform_extents Link
🟢 Revert "Revert "[aat] Support feature ranges"" Link
🟢 [aat] Always generate a feature range Link
🟢 [aat] Support ranges in NonContextual subtable as well Link
🟢 [aat] Run subtable across ranges if flags match Link
🟢 [aat] Adjust last range Link
🟢 [aat] Always unsafe-to-concat in state machine Link
🟢 [aat] Add test for feature range Link
🟢 [aat] Optimize feature application Link
🟢 [aat] Comment Link
🟡 [aat] Optimize feature-range application Link No idea what this does.
🟢 [aat] Initialize values Link
🟢 [test] Remove non-free font and its test Link
🟢 [COLRv1] Don't return extents if glyph has no paint Link
🟢 [COLRv1] Handle void extents Link
🟢 [layout] Limit how far we skip when looking back Link
🟢 [gsubgpos] Refactor skippy_iter.match() Link
🟢 Revert "[layout] Limit how far we skip when looking back" Link
🟢 [GPOS] Avoid O(n^2) behavior in mark-attachment Link
🟢 [buffer] Optimize _infos_find_min_cluster for monotone clusters Link
🟢 [buffer] Optimize _infos_set_glyph_flags to avoid O(n^2) behavior Link
🟢 [buffer] Speed up merge_clusters_impl Link
🟢 [GPOS] Fix assert fail introduced recently Link
LaurenzV commented 1 week ago

@RazrFalcon The commits contain pretty big changes to AAT... I'm afraid we need to figure out a solution to the AAT testing problem before we continue with that. :( I don't think it makes sense to make AAT more broken than it (potentially) already is, and at least to me it seems that there are a few differences between rustybuzz and harfbuzz code-wise, which makes it even more complicated. :(

RazrFalcon commented 1 week ago

Ugh... AAT is pain. I know it works because I ran tests locally before, at least some of them. But any changes would require a proper CI support.

And yes, HB and RB implementations are pretty different, because HB uses insane C++ templates magic, which we cannot port to Rust or any other language.

One thing to note is that the ttf-parser side of the code should be fine. I don't think we would need to change it. But rustybuzz one...

Otherwise, it's the same drill as always: I wrote this code 4 years ago and I do not remember anything. We're on the same level here. All I remember is that even Rust code relies on generics, otherwise there will be a ton of duplication. This is because AAT State tables can have different payload, but most of the stuff is the same. I think it took me like 2 weeks or more to port AAT code...

Can you point me to HB commits related to AAT?

As for testing, last time I've checked, i.e. 4 year ago, HB was running different macOS versions on CI. Then it would check hash-sum of system fonts and if any of them match - runs tests. We have to do the same. We can start with macOS 14 to see how many tests would pass there. Aka how many fonts would match their hash-sums.

LaurenzV commented 1 week ago

Can you point me to HB commits related to AAT?

Those should encompass the main changes: https://github.com/harfbuzz/harfbuzz/compare/b2087132..8b17c6ca3

There are some smaller ones afterwards, but they don't seem really hard.

As for testing, last time I've checked, i.e. 4 year ago, HB was running different macOS versions on CI.

I wanted to look into that, too, but at least to me it seems that CI always runs on the latest MacOS: https://github.com/harfbuzz/harfbuzz/blob/main/.github/workflows/macos-ci.yml

We can start with macOS 14 to see how many tests would pass there. Aka how many fonts would match their hash-sums.

I have a Mac, so I'll give it a try!

RazrFalcon commented 1 week ago

Those should encompass the main changes

Doesn't looks that bad. Some stuff moved around, but it's not a rewrite of the logic I was worried about.

Here is our drive method.

And it seems NoncontextualSubtable::apply in HB is this in RB.

And you have to pass hb_aat_apply_context_t everywhere now, but we don't have it... The only place I see the ankr table is in Driver4. I guess due to lifetime rules or generics restrictions I've ended passing required data differently.

Hopefully this clarifies things a bit. Maybe I will take a look into it myself later this week.

I wanted to look into that, too, but at least to me it seems that CI always runs on the latest MacOS:

Interesting. Maybe something changed or I misremember stuff. But tests still check for hashes.

LaurenzV commented 1 week ago

Okay, thank you, I will see when I find the time and motivation to look into it in more detail.

LaurenzV commented 1 week ago

@RazrFalcon So, by the way, I feel kind of bad for asking about this considering that a lot of time and effort was put into ttf-parser (and it obviously works very well) and I hope you don't take it the wrong way! I'm mostly just asking because I'm interested in your opinion: What is your opinion on transitioning towards using read-fonts from fontations instead of ttf-parser for rustybuzz in the long term?

As mentioned, ttf-parser obviously works pretty well, but using fontations does seem pretty appealing, for one reason because it's backed by Google and there seem to be multiple people working on it actively, and on the other hand I think it's also much better-tested (using both, unit tests and fuzzing) and has other nice features like hinting, which is also a nice-to-have.

The only (potential) disadvantages I could see are binary size (I presume that it's a heavier dependency than ttf-parser, although I haven't tried it, and I don't know by how much) and maybe no-std could be a problem (although from what I can tell they do have an option to disable it, but one would have to try it out).

As I mentioned, I'm more curious to hear what you think, I'm not saying that we should definitely do this.

RazrFalcon commented 1 week ago

I'm aware of fontations, but last time I've checked (a year ago?) it was nowhere near ttf-parser, but looks like they come a long way. The API is way more low level though.

Honestly, I don't have much attachment to my projects. I don't have time working on them anymore. If I could drop ttf-parser or even rustybuzz - I will be more than happy. But the reason those two projects exist is because there are no alternatives. And I do hope they emerge in the future.

Yes, there is swash, which is sort of abandoned and completely untested. And seems like it uses fontations as well now.

Also note that we use ttf-parser not only in rustybuzz, but in fontdb (a very tiny part of it) and usvg (for actual glyphs access) as well.

While ttf-parser might not be perfect, it's still pretty damn good. It's mostly feature-complete. Some stuff like phantom points is missing, but it could be added relatively easy. As for hinting, I'm not sure how it works and how hard it would be to implement it. Anything else worth the switch?

I think it's also much better-tested

While ttf-parser test suite is objectively trash, our primary test harness is rustybuzz, which leads to nearly 100% code coverage.

In the end it all boils down to: what we would gain from switching to fontations?

Honestly, when it comes to throwing stuff out of the window - rustybuzz is the first thing that comes to mind. I hate it so much. I spend so much time working it. I still have no clue how it actually works. It's an endless rat race to keep it in sync with harfbuzz. It's just bad on every level. And yet, there are no alternatives. I really hope they appear as soon as possible.

PS: looks like people at fontations even have their own fontdb now (fontique), which uses roxmltree to parse fontconfig, just like we do. Same goal - different implementations.

LaurenzV commented 1 week ago

While ttf-parser might not be perfect, it's still pretty damn good. It's mostly feature-complete.

As I mentioned, I didn't mean to imply it's not, it obviously works really well! I don't think there is a pressing need to switch, I just was curious on your thoughts.

Some stuff like phantom points is missing, but it could be added relatively easy.

I'm actually working on this right now, I'm having some difficulties with integrating it with variations, but hopefully I can manage to get it to work. I'll try to create a PR soon-ish.

Also note that we use ttf-parser not only in rustybuzz, but in fontdb (a very tiny part of it) and usvg (for actual glyphs access) as well.

Yeah, that's true...

In the end it all boils down to: what we would gain from switching to fontations?

I guess right now, apart from the hinting thing, there might not be any immediate benefit. I was just thinking that in the long term, it could be nice to look into it since it seems to have a lot of backing. Probably the best thing is just to wait and see how things develop...

behdad commented 1 week ago

Honestly, I don't have much attachment to my projects. I don't have time working on them anymore. If I could drop ttf-parser or even rustybuzz - I will be more than happy. But the reason those two projects exist is because there are no alternatives. And I do hope they emerge in the future.

I've said this before: We'd be happy to move rustybuzz to the harfbuzz org and try to keep it up to date after the forward-porting by @LaurenzV is complete.

Yes, there is swash, which is sort of abandoned and completely untested. And seems like it uses fontations as well now.

Correct.

PS: looks like people at fontations even have their own fontdb now (fontique), which uses roxmltree to parse fontconfig, just like we do. Same goal - different implementations.

While I don't work on fontations (yet), I can share the vision. It started with the desire to build a unified foundation for font compilation and consumption. Chrome is already in the process of replacing FreeType with fontations. And fontc is on its way to replace fontmake. It makes a lot sense then to move to a HarfBuzz replacement built on the same ecosystem. And porting RustyBuzz sounds like the most viable option to get there.

RazrFalcon commented 1 week ago

We'd be happy to move rustybuzz to the harfbuzz org and try to keep it up to date

I'm skeptical how well it will work. Maintaining two codebases instead of one is not fun. Simply moving the repo around would not help much either.

The only way to keep rustybuzz alive is to make backporting as simple and as streamlined as possible. Which is impossible, because harfbuzz relies on C++ templates magic. A curse of every C++ project. And pointers magic. And ragel...

We cannot simply port/translate harfbuzz to Rust 1-to-1. We have to make changes to avoid unsafety, like having ttf-parser instead of casting pointers left and right. All of which leads to a more complicated backporting process.

The only real solution is to have harfbuzz or a brand new shaper to be written in Rust. Or even better, have a reference shaper implementation in a sane language with a great test suite. That would be a dream come true.

It started with the desire to build a unified foundation for font compilation and consumption.

Yeah, that's kinda my problem with it. I need a user friendly, high-level TrueType parser, which is the purpose of ttf-parser. I do not care about subsetting, font compilation and other stuff. ttf-parser is basically a better stb_truetype and not a Swiss army knife for wrangling fonts.

It makes a lot sense then to move to a HarfBuzz replacement built on the same ecosystem.

The problem is that harfbuzz does a lot of things. A lot. And rustybuzz does only shaping. So swapping ttf-parser with read-fonts will not change anything. A brand new shaper with Rust in mind is the way to go.

Once again, I will archive rustybuzz in a heartbeat as soon as a worthy alternative would come up. But we're not there yet and we have to maintain this Frankenstein monstrosity in the meantime.

RazrFalcon commented 1 week ago

I guess right now, apart from the hinting thing, there might not be any immediate benefit.

To my knowledge, hinting has no effect on shaping and glyphs rendering in resvg is trash to begin with, since we simply rasterizing bezier paths. I don't think hinting support will have any effect in our case.

LaurenzV commented 1 week ago

Yeah, that's kinda my problem with it. I need a user friendly, high-level TrueType parser, which is the purpose of ttf-parser. I do not care about subsetting, font compilation and other stuff.

That's exactly what read-fonts does, though, it's just a TrueType parser. The subsetting, font compilation etc. part are all parts of separate crates that build on top of read-fonts, so you obviously don't need to include those if you don't need them.

But yeah, I just noticed that fontations doesn't seem to support tables necessary for AAT, so using it instead of ttf-parser is out of question anyway, for now. Let's see what the situation is like in the next few months/years.

RazrFalcon commented 1 week ago

@LaurenzV how did you able to build hb-shape on macos? It seems like a simple meson builddir && ninja -Cbuilddir doesn't work anymore. Looks like this is because hb utils depend on cairo now and meson fails to detect it, despite the fact that I have cairo installed via brew. Typical C/C++ nonsense...

LaurenzV commented 1 week ago

What commit are you going from?

RazrFalcon commented 1 week ago

6.0.0 tag

LaurenzV commented 1 week ago

It seems to work fine for me... I always run this script when updating the tests:

cd ../harfbuzz
meson builddir # `--reconfigure` is needed only on the first run
ninja -Cbuilddir
cd ../rustybuzz/scripts
python3 gen-shaping-tests.py ../../harfbuzz
cargo fmt
RazrFalcon commented 1 week ago

That's exactly what read-fonts does, though, it's just a TrueType parser.

Yes, but a very low-level one, compared to ttf-parser. ttf-parser never exposes raw offsets and stuff. Everything is nicely wrapped. Instead, you have to use skrifa get a tff-parser-line API.

But yeah, will wait for now.

RazrFalcon commented 1 week ago

Can you try removing builddir and build it from scratch to make sure you aren't using old hb-shape?

LaurenzV commented 1 week ago

I did that and it still works fine. What exactly is the error you are getting?

RazrFalcon commented 1 week ago

I do not get an error per se, but I see:

  Dependencies used for command-line utilities
    Cairo            : NO
    Chafa            : NO

And builddir/util is simply empty for me...

LaurenzV commented 1 week ago

Check reconfigure/meson-logs/meson-log.txt, in case you have that.

Here is my log output in case it helps. Maybe you could post yours.

Determining dependency 'cairo' with pkg-config executable '/opt/homebrew/bin/pkg-config'
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --modversion cairo` -> 0
stdout:
1.18.0
-----------
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --cflags cairo` -> 0
stdout:
-I/opt/homebrew/Cellar/cairo/1.18.0/include/cairo -I/opt/homebrew/Cellar/fontconfig/2.15.0/include -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/libxext/1.3.6/include -I/opt/homebrew/Cellar/libxrender/0.9.11/include -I/opt/homebrew/Cellar/libx11/1.8.9/include -I/opt/homebrew/Cellar/libxcb/1.17.0/include -I/opt/homebrew/Cellar/libxau/1.0.11/include -I/opt/homebrew/Cellar/libxdmcp/1.1.5/include -I/opt/homebrew/Cellar/pixman/0.42.2/include/pixman-1 -I/opt/homebrew/Cellar/xorgproto/2024.1/include
-----------
env[PKG_CONFIG_ALLOW_SYSTEM_LIBS]: 1
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --libs cairo` -> 0
stdout:
-L/opt/homebrew/Cellar/cairo/1.18.0/lib -lcairo
-----------
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --libs cairo` -> 0
stdout:
-L/opt/homebrew/Cellar/cairo/1.18.0/lib -lcairo
-----------
Run-time dependency cairo found: YES 1.18.0
Determining dependency 'cairo-ft' with pkg-config executable '/opt/homebrew/bin/pkg-config'
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --modversion cairo-ft` -> 0
stdout:
1.18.0
-----------
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --cflags cairo-ft` -> 0
stdout:
-I/opt/homebrew/Cellar/cairo/1.18.0/include -I/opt/homebrew/Cellar/cairo/1.18.0/include/cairo -I/opt/homebrew/Cellar/fontconfig/2.15.0/include -I/opt/homebrew/Cellar/libxext/1.3.6/include -I/opt/homebrew/Cellar/libxrender/0.9.11/include -I/opt/homebrew/Cellar/libx11/1.8.9/include -I/opt/homebrew/Cellar/libxcb/1.17.0/include -I/opt/homebrew/Cellar/libxau/1.0.11/include -I/opt/homebrew/Cellar/libxdmcp/1.1.5/include -I/opt/homebrew/Cellar/pixman/0.42.2/include/pixman-1 -I/opt/homebrew/opt/freetype/include/freetype2 -I/opt/homebrew/opt/libpng/include/libpng16 -I/opt/homebrew/Cellar/xorgproto/2024.1/include
-----------
env[PKG_CONFIG_ALLOW_SYSTEM_LIBS]: 1
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --libs cairo-ft` -> 0
stdout:
-L/opt/homebrew/Cellar/cairo/1.18.0/lib -L/opt/homebrew/opt/freetype/lib -lcairo -lfreetype
-----------
env[PKG_CONFIG_PATH]: 
env[PKG_CONFIG]: /opt/homebrew/bin/pkg-config
-----------
Called: `/opt/homebrew/bin/pkg-config --libs cairo-ft` -> 0
stdout:
-L/opt/homebrew/Cellar/cairo/1.18.0/lib -L/opt/homebrew/opt/freetype/lib -lcairo -lfreetype
-----------
Run-time dependency cairo-ft found: YES 1.18.0
LaurenzV commented 1 week ago

Nevermind, you need to check builddir/meson-logs.

RazrFalcon commented 1 week ago

I had to install pkg-config. Thanks meson...

LaurenzV commented 1 week ago

Do you know if hb_sorted_vector_t actually always keeps the elements sorted when you insert a new one, or is it just a kind of "marker" type to indicate that the contents should be sorted. If it's the former (which I presume), not sure what we should use here...

LaurenzV commented 1 week ago

Also, what does this do:

if (event->start)
      {
    active_features.push (event->feature);
      } else {
    feature_info_t *feature = active_features.lsearch (event->feature);
    if (feature)
      active_features.remove_ordered (feature - active_features.arrayZ);
      }

I don't really get the feature - active_features.arrayZ why are we subtracting here?

RazrFalcon commented 1 week ago

Do you know if hb_sorted_vector_t actually always keeps the elements sorted when you insert a new one

No idea. Doesn't seem like it. Looks like all it does is feature-gates hb_sorted_vector_t::bsearch methods. Probably just a compile-time sanity check.

I don't really get the feature - active_features.arrayZ why are we subtracting here?

arrayZ is a pointer to the array start. So it simply find a distances between feature pointer and the array start. In Rust it will be just:

if let Some(index) = active_features.iter().position(|f| f == feature) {
    active_features.remove(index);
}

Classic C shenanigans.

RazrFalcon commented 1 week ago

The Rust code sample above is just for the feature - active_features.arrayZ part by the way. The whole code will be translated to:

if let Ok(index) = active_features.binary_search(&feature) {
    active_features.remove(index);
}
LaurenzV commented 1 week ago

Got it, thanks.

LaurenzV commented 1 week ago

That was awful... But hopefully done soon, I only need to look into some issues while porting the last few commits until 7.0.

RazrFalcon commented 1 week ago

🫡

RazrFalcon commented 1 week ago

I do not know why are you torturing yourself this much, but it's highly appreciated.

Once again, if you have any ideas about how to make porting easier - feel free to share them. I know that templates and borrow checker related stuff is mostly unsolvable, but we can improve other stuff.

We might even try "improving" harfbuzz itself 👀

LaurenzV commented 1 week ago

Once again, if you have any ideas about how to make porting easier - feel free to share them. I know that templates and borrow checker related stuff is mostly unsolvable, but we can improve other stuff.

Not sure I have many suggestions, apart from the fact that some of the things could be aligned more to harfbuzz, especially in AAT. But overall it's pretty okay for now, I think!

RazrFalcon commented 1 week ago

some of the things could be aligned more to harfbuzz, especially in AAT

Trust me, I've tried. I couldn't make Rust generics happy with C++ templates duck typing.

LaurenzV commented 1 week ago

In harfbuzz, both info and out_info are of type hb_glyph_info_t, while in rustybuzz we have this weird bytemuck thing going on for out_info. Could you explain why we have this?

LaurenzV commented 1 week ago

Just a couple minor questions (see the top comment), but other than that this should be done now.

RazrFalcon commented 1 week ago

In harfbuzz, both info and out_info are of type hb_glyph_info_t, while in rustybuzz we have this weird bytemuck thing going on for out_info. Could you explain why we have this?

Well, well, well. You're not the first one. https://github.com/harfbuzz/harfbuzz/issues/1967

In a nutshell, in harfbuzz hb_buffer_t::info and hb_buffer_t::out_info are not owning pointers. The only buffers that harfbuzz allocates are hb_buffer_t::info and hb_buffer_t::pos. And it uses/reporpuses hb_buffer_t::pos allocation/memory as hb_buffer_t::out_info during processing. That's why hb_glyph_info_t and hb_glyph_position_t have identical size. Classic C shenanigans.

We don't need this API, right? We don't have anything similar to hb-view, so it probably doesn't make much sense

Yep, we do not have hb-view alternative.

[gsubgpos] Minor use ->clear() directly Link relevant? wasnt able to fint he corresponding code

Yes, looks like we do not have it.

[aat] Optimize feature-range application Link No idea what this does.

Checks if any morx subtables have a required feature flags? hb_iter, as the name suggest, is harfbuzz's own std::ranges implementation. Think Iterator in Rust. Doesn't seems like we need this one.

LaurenzV commented 1 week ago

Okay, thanks! Then this should be ready from my side.

RazrFalcon commented 1 week ago

Congrats! We're a year behind harfbuzz now.

Should we make a release?

LaurenzV commented 1 week ago

I think we can still wait a bit. I want to see how far I can get in the next few days, but as you want!