Closed bluebear94 closed 10 months ago
HarfBuzz 2.7.4’s gen-use-table.py
doesn’t run on the current version of the Unicode data files. (gen-tag-table.py
also needed a change to run properly.) Would it be viable to update these to the latest version right away?
Never mind; I was silly and didn’t notice that I needed to update the Unicode version in the URLs.
Note to self: is this change to hb-ot-layout-base-table.hh
relevant?
@bluebear94 Good progress. Here is my failed attempt. Maybe it would help a bit: Patch.patch
I haven't been able to grok machine_index_t
iterator logic. Afaik it's mutating and bidirectional, which I'm not sure how to implement in Rust.
Either way, feel free to ask me anything. Just remember that I wrote this code 3 years ago and I do not remember much myself.
Note to self: is this change to hb-ot-layout-base-table.hh relevant?
Any changes to font tables parsing are irrelevant. The BASE table is unused, at least of 2.7
The way I was doing the back-porting is by checking each commit, not the whole diff at once. harfbuzz commits are usually pretty small and self-contains, so it's easier that way.
If a commit has changes related to build system, fonts parsing, subsetting, C++ stuff - it can be safely ignored.
If a commit contains changes to tests, like this one, then you have to checkout this commit. Build harfbuzz from sources:
meson builddir --reconfigure
ninja -Cbuilddir
And then run scripts/gen-shaping-tests.py /path/to/harfbuzz-src
HarfBuzz 2.7.4’s buffer serialization was changed to surround the contents in square brackets. Should this be reflected in rustybuzz
? Currently, it is, but this will require a minor change to gen-shaping-tests.py
.
Should this be reflected in rustybuzz?
Sure.
Trying to fix the remaining test failure. For the font tests/fonts/in-house/8228d035fcd65d62ec9728fb34f42c63be93a5d3.ttf
, HarfBuzz is getting −639 for the x-bearing of glyph #4’s extents, but rustybuzz is getting −638 instead.
I've checked that font with ttf-explorer and it does use -639
.
Is this a new test case or some recent change broke it?
It’s an existing test case that was manually changed after generation. This test was supposed to have an expected output of x=0+1030|acutecomb=0@-19,-27+0|X=2+1295|acutecomb=2@-151,320+0
(as generated by the gen-shaping-tests.py
script), but the one currently in the master branch has x=0+1030|acutecomb=0@-20,-27+0|X=2+1295|acutecomb=2@-152,321+0
instead.
Ugh... good find. I would assume this change was caused by this commit. In which case it's not a bug a we should prefer rustybuzz output. Some floats rounding differences are expected.
Lets leave the manual correction in place and I will think what could be done here.
Now morx_34_001
and morx_36_001
are failing (producing shorter outputs than expected). These test resource exhaustion cases, and the failures might be related to the work limit increases in HarfBuzz.
Rounding errors are ok, a different output isn't.
How do you generate Rust tests? As I've said before, the ideal workflow is to checkout each harfbuzz commit you're trying to port, rebuild harfbuzz and rerun scripts/gen-shaping-tests.py
. harfbuzz tests are not set in stone, they change and you have to constantly keep the Rust one up to date.
I’ve identified and fixed the cause of morx_34_001
and morx_36_001
failing (HB increased buffer resource limits).
I’ve generated the Rust tests as you’ve specified (except that this time, I went directly to the 2.7.4 release). I think porting the changes one commit at a time as you suggested might be easier in the future, though, so that when something goes wrong with the tests, I have less stuff to search through.
Yes, harfbuzz tests are very tricky and pedantic, which is a good thing, but this also makes porting more tedious. You do have to port stuff commit-per-commit. Trying to port in bulk would rarely work in my experience. And when porting commit-per-commit you immediately know the cause.
When writing this crate originally, I was running tests after porting each C++ function. Otherwise you would never find the cause.
Could you review my initial attempt to get rid of the Vec
allocation in find_syllables
at d523ea2?
I’m not completely satisfied with the use of Cell
; I think I can avoid it by passing a reference to the buffer explicitly to included
, next_glyph
, and prev_glyph
.
Yes, I was struggling with this one as well. My initial implementation was almost identical to your iterator based one. But tests were failing.
Will take a look what can be done here.
Merged the changes from #77 into this branch. Unfortunately, due to limitations in Ragel and in Rust, I had to switch back to allocating a vector in universal_machine::find_syllables
for now.
@bluebear94 Sure, no problem. Thanks for looking into it. Hopefully it was relatively easy for you to figure out how to build master ragel. I will update the readme soon.
PS: I wasn't notified that this pull request is no longer a draft. I would suggest pinging me directly. Do you consider it to be done, feature-wise?
Yes.
I’ve almost figured out how to get rid of the vec allocation again, but unfortunately, it requires manually applying a few edits to the generated file. The problem is that Ragel outputs assignment statements such as p = 0;
, but Rust doesn’t support overloading assignment unlike C++.
Hm... I'm not familiar with Ragel myself, but it would be great to be able to avoid manual edits. Once again, there is no rush.
By the way, why there are so many changes to tests/shaping_text_rendering_tests.rs
?
These are due to changes in the morx_34_001
and morx_36_001
tests due to increased resource limits in HarfBuzz 2.7.4 (mentioned in this comment). None of the other tests in that file have changed.
Hm... github shows:
23,060 additions, 5 deletions not shown because the diff is too large. Please use a local Git client to view these changes.
I can't even see what was changed.
Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?
Are you talking about Python scripts? I think that should stay untouched. They are ported from harfbuzz as well. Just reformatted and outputting Rust instead of C++. rustybuzz should have as little changes as possible.
Refactor scripts (extract common logic, store downloaded data files in a separate directory) – maybe another PR?
Are you talking about Python scripts? I think that should stay untouched. They are ported from harfbuzz as well. Just reformatted and outputting Rust instead of C++. rustybuzz should have as little changes as possible.
Newer HarfBuzz versions use PackTab for a bunch of stuff. It's a small one-file Python library. Should be easy to add Rust output to it:
@bluebear94 Any updates on why tests/shaping_text_rendering_tests.rs
diff is so big?
@notgull Porting commits one by one is fine. Adding the exact commit hash is probably unnecessary. We simply don't need this information.
I plan to try to refactor the code base a bit to make it more harfbuzz-like. File structure, code structure, etc. Ideally, rustybuzz should look like harfbuzz transpiled to Rust. The less changes we have, the either is should be to port.
Any updates on why
tests/shaping_text_rendering_tests.rs
diff is so big?
See my previous comment on this. It might be possible to make the offending tests shorter using str::repeat
, but that would require nontrivial changes to gen-shaping-tests.py
. Alternatively, we can just exclude the morx_34_001
and morx_36_001
tests from being generated.
@bluebear94 Oh, I think I got it now. Those tests produce an enormous output? And harfbuzz tests simply use *
to indicate all glyphs? Is this right?
In this case I would add those tests to the ignore list in scripts/gen-shaping-tests.py
for now and later I would try to figure out what to do with them.
It’s not that all glyphs from these fonts are being tested, but rather than these fonts contain glyphs that expand indefinitely.
Anyway, I pushed a commit to ignore these tests.
I see now. Those tests were already absurd before. I guess I would have to handle them separately.
Anyway, thanks for your work! I honestly wasn't expecting anyone to be brave enough to do it. I plan to do some small changes in the next couple of days and make a new release and then, if you're still interested, you can continue working on it. I just wanted to warn you not to fork it until I restructure the code base a bit.
@bluebear94 If you're still interested in updating rustybuzz, I can give you write access to this repo, to simplify your workflow. This would allow more gradual changes. Just make sure each commit passes the CI.
See #37.
gen-tag-table.py
gen-universal-table.py
universal_machine::find_syllables
to avoid theVec
allocation?