RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
498 stars 34 forks source link

Sync with HarfBuzz 2.7.4 #75

Closed bluebear94 closed 10 months ago

bluebear94 commented 10 months ago

See #37.

bluebear94 commented 10 months ago

HarfBuzz 2.7.4’s gen-use-table.py doesn’t run on the current version of the Unicode data files. (gen-tag-table.py also needed a change to run properly.) Would it be viable to update these to the latest version right away?

Never mind; I was silly and didn’t notice that I needed to update the Unicode version in the URLs.

bluebear94 commented 10 months ago

Note to self: is this change to hb-ot-layout-base-table.hh relevant?

RazrFalcon commented 10 months ago

@bluebear94 Good progress. Here is my failed attempt. Maybe it would help a bit: Patch.patch

I haven't been able to grok machine_index_t iterator logic. Afaik it's mutating and bidirectional, which I'm not sure how to implement in Rust.

Either way, feel free to ask me anything. Just remember that I wrote this code 3 years ago and I do not remember much myself.

RazrFalcon commented 10 months ago

Note to self: is this change to hb-ot-layout-base-table.hh relevant?

Any changes to font tables parsing are irrelevant. The BASE table is unused, at least of 2.7

RazrFalcon commented 10 months ago

There is also a pseudo-bug. This code code should use use_category, not indic_category. It doesn't matter right now. But harfbuzz switched to use_category "byte" from 2 to 3.

RazrFalcon commented 10 months ago

The way I was doing the back-porting is by checking each commit, not the whole diff at once. harfbuzz commits are usually pretty small and self-contains, so it's easier that way.

If a commit has changes related to build system, fonts parsing, subsetting, C++ stuff - it can be safely ignored.

If a commit contains changes to tests, like this one, then you have to checkout this commit. Build harfbuzz from sources:

meson builddir --reconfigure
ninja -Cbuilddir

And then run scripts/gen-shaping-tests.py /path/to/harfbuzz-src

bluebear94 commented 10 months ago

HarfBuzz 2.7.4’s buffer serialization was changed to surround the contents in square brackets. Should this be reflected in rustybuzz? Currently, it is, but this will require a minor change to gen-shaping-tests.py.

RazrFalcon commented 10 months ago

Should this be reflected in rustybuzz?

Sure.

bluebear94 commented 10 months ago

Trying to fix the remaining test failure. For the font tests/fonts/in-house/8228d035fcd65d62ec9728fb34f42c63be93a5d3.ttf, HarfBuzz is getting −639 for the x-bearing of glyph #4’s extents, but rustybuzz is getting −638 instead.

RazrFalcon commented 10 months ago

I've checked that font with ttf-explorer and it does use -639. Is this a new test case or some recent change broke it?

bluebear94 commented 10 months ago

It’s an existing test case that was manually changed after generation. This test was supposed to have an expected output of x=0+1030|acutecomb=0@-19,-27+0|X=2+1295|acutecomb=2@-151,320+0 (as generated by the gen-shaping-tests.py script), but the one currently in the master branch has x=0+1030|acutecomb=0@-20,-27+0|X=2+1295|acutecomb=2@-152,321+0 instead.

RazrFalcon commented 10 months ago

Ugh... good find. I would assume this change was caused by this commit. In which case it's not a bug a we should prefer rustybuzz output. Some floats rounding differences are expected.

Lets leave the manual correction in place and I will think what could be done here.

bluebear94 commented 10 months ago

Now morx_34_001 and morx_36_001 are failing (producing shorter outputs than expected). These test resource exhaustion cases, and the failures might be related to the work limit increases in HarfBuzz.

RazrFalcon commented 10 months ago

Rounding errors are ok, a different output isn't.

How do you generate Rust tests? As I've said before, the ideal workflow is to checkout each harfbuzz commit you're trying to port, rebuild harfbuzz and rerun scripts/gen-shaping-tests.py. harfbuzz tests are not set in stone, they change and you have to constantly keep the Rust one up to date.

bluebear94 commented 10 months ago

I’ve identified and fixed the cause of morx_34_001 and morx_36_001 failing (HB increased buffer resource limits).

I’ve generated the Rust tests as you’ve specified (except that this time, I went directly to the 2.7.4 release). I think porting the changes one commit at a time as you suggested might be easier in the future, though, so that when something goes wrong with the tests, I have less stuff to search through.

RazrFalcon commented 10 months ago

Yes, harfbuzz tests are very tricky and pedantic, which is a good thing, but this also makes porting more tedious. You do have to port stuff commit-per-commit. Trying to port in bulk would rarely work in my experience. And when porting commit-per-commit you immediately know the cause.

When writing this crate originally, I was running tests after porting each C++ function. Otherwise you would never find the cause.

bluebear94 commented 10 months ago

Could you review my initial attempt to get rid of the Vec allocation in find_syllables at d523ea2?

I’m not completely satisfied with the use of Cell; I think I can avoid it by passing a reference to the buffer explicitly to included, next_glyph, and prev_glyph.

RazrFalcon commented 10 months ago

Yes, I was struggling with this one as well. My initial implementation was almost identical to your iterator based one. But tests were failing.

Will take a look what can be done here.

bluebear94 commented 10 months ago

Merged the changes from #77 into this branch. Unfortunately, due to limitations in Ragel and in Rust, I had to switch back to allocating a vector in universal_machine::find_syllables for now.

RazrFalcon commented 10 months ago

@bluebear94 Sure, no problem. Thanks for looking into it. Hopefully it was relatively easy for you to figure out how to build master ragel. I will update the readme soon.

RazrFalcon commented 10 months ago

PS: I wasn't notified that this pull request is no longer a draft. I would suggest pinging me directly. Do you consider it to be done, feature-wise?

bluebear94 commented 10 months ago

Yes.

I’ve almost figured out how to get rid of the vec allocation again, but unfortunately, it requires manually applying a few edits to the generated file. The problem is that Ragel outputs assignment statements such as p = 0;, but Rust doesn’t support overloading assignment unlike C++.

RazrFalcon commented 10 months ago

Hm... I'm not familiar with Ragel myself, but it would be great to be able to avoid manual edits. Once again, there is no rush.

RazrFalcon commented 10 months ago

By the way, why there are so many changes to tests/shaping_text_rendering_tests.rs?

bluebear94 commented 10 months ago

These are due to changes in the morx_34_001 and morx_36_001 tests due to increased resource limits in HarfBuzz 2.7.4 (mentioned in this comment). None of the other tests in that file have changed.

RazrFalcon commented 10 months ago

Hm... github shows:

23,060 additions, 5 deletions not shown because the diff is too large. Please use a local Git client to view these changes.

I can't even see what was changed.

RazrFalcon commented 10 months ago

Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?

Are you talking about Python scripts? I think that should stay untouched. They are ported from harfbuzz as well. Just reformatted and outputting Rust instead of C++. rustybuzz should have as little changes as possible.

behdad commented 10 months ago

Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?

Are you talking about Python scripts? I think that should stay untouched. They are ported from harfbuzz as well. Just reformatted and outputting Rust instead of C++. rustybuzz should have as little changes as possible.

Newer HarfBuzz versions use PackTab for a bunch of stuff. It's a small one-file Python library. Should be easy to add Rust output to it:

https://github.com/harfbuzz/packtab/issues/5

RazrFalcon commented 10 months ago

@bluebear94 Any updates on why tests/shaping_text_rendering_tests.rs diff is so big?

@notgull Porting commits one by one is fine. Adding the exact commit hash is probably unnecessary. We simply don't need this information.

I plan to try to refactor the code base a bit to make it more harfbuzz-like. File structure, code structure, etc. Ideally, rustybuzz should look like harfbuzz transpiled to Rust. The less changes we have, the either is should be to port.

bluebear94 commented 10 months ago

Any updates on why tests/shaping_text_rendering_tests.rs diff is so big?

See my previous comment on this. It might be possible to make the offending tests shorter using str::repeat, but that would require nontrivial changes to gen-shaping-tests.py. Alternatively, we can just exclude the morx_34_001 and morx_36_001 tests from being generated.

RazrFalcon commented 10 months ago

@bluebear94 Oh, I think I got it now. Those tests produce an enormous output? And harfbuzz tests simply use * to indicate all glyphs? Is this right?

In this case I would add those tests to the ignore list in scripts/gen-shaping-tests.py for now and later I would try to figure out what to do with them.

bluebear94 commented 10 months ago

It’s not that all glyphs from these fonts are being tested, but rather than these fonts contain glyphs that expand indefinitely.

Anyway, I pushed a commit to ignore these tests.

RazrFalcon commented 10 months ago

I see now. Those tests were already absurd before. I guess I would have to handle them separately.

Anyway, thanks for your work! I honestly wasn't expecting anyone to be brave enough to do it. I plan to do some small changes in the next couple of days and make a new release and then, if you're still interested, you can continue working on it. I just wanted to warn you not to fork it until I restructure the code base a bit.

RazrFalcon commented 10 months ago

@bluebear94 If you're still interested in updating rustybuzz, I can give you write access to this repo, to simplify your workflow. This would allow more gradual changes. Just make sure each commit passes the CI.