dfrg / swash

Font introspection, complex text shaping and glyph rendering.
Apache License 2.0
584 stars 34 forks source link

Correctness/Tests #6

Open kirawi opened 3 years ago

kirawi commented 3 years ago

You briefly mentioned this on the Reddit announcement:

Correctness is... tricky. I would like to run it against the Harfbuzz test suite, but I will have to account for differences in output both with respect to cluster numbering and how Harfbuzz distributes some positioning information differently between offsets and advances. Still, this is an area that needs attention.

I'm admittedly unqualified to understand why it's difficult to integrate the HarfBuzz test suite like rustybuzz did, but how immediate is it on your TODO list?

Tracking

Kethku commented 3 years ago

Note: rustybuzz is a direct port of harfbuzz. So its not surprising that they could just lift the test suite directly

dfrg commented 3 years ago

I am actually going to begin investigating this in the next few hours because I believe it is vitally important now that this crate is released. I will leave this issue open and report back here on progress.

dfrg commented 3 years ago

Small update on this. I've spent some time with the HarfBuzz test suite and developed a plan for moving forward. I won't try to predict a timeframe, but this issue is fairly high priority for everyone that has expressed interest in using this crate for shaping so it will be receiving a high proportion of my attention.

kirawi commented 2 years ago

Small update on this. I've spent some time with the HarfBuzz test suite and developed a plan for moving forward. I won't try to predict a timeframe, but this issue is fairly high priority for everyone that has expressed interest in using this crate for shaping so it will be receiving a high proportion of my attention.

If you're interested, I could try implementing it if you don't mind giving a rundown on how you planned on doing it.

dfrg commented 2 years ago

This is still important, but text layout has unfortunately taken precedence due to other interested parties that are making use of the crate. If you want to have a go at it, that sounds great! My plan is to essentially use the HarfBuzz test suite as is, so the first step is to write a parser for the test data to extract the fonts, input text and expected output glyph sequences. I don't recall if the test data contains the associated script. If not, you'd need to detect the dominant script in the input text. Then it's just a matter of loading the font, running the shaper and comparing the results.

I don't have a lot of time to devote to this at the moment, but am happy to provide assistance when I am able.

kirawi commented 2 years ago

Gotcha, I'll work on it as soon as I'm able.

kirawi commented 2 years ago

After some poking around with HarfBuzz, RustyBuzz, and Swash, I think I have a good idea on how to proceed from here. Since I know absolutely nothing about shaping, I'll estimate it'll take me 1-2 weeks to get a draft PR submitted (though with anything, prepare to add 1 or 2 weeks to that estimate :-) ). I was keen on learning shaping anyway, so I'm excited to start working on this.

dfrg commented 2 years ago

Brilliant!

Tackling this was about 8-10 weeks out for me, so no rush at all. As for shaping, enjoy your journey down the rabbit hole. I took the tumble almost two years ago now and haven't hit the bottom yet :)

kirawi commented 2 years ago

It's only a draft PR, so it may end up being that original 8-10 week estimate to get it merged :P

dfrg commented 2 years ago

No worries. If you're not done by time it rolls around on my schedule, we'll finish it up together. Then comes the challenge of fixing all the bugs that the tests are sure to reveal. I know HarfBuzz has some per-script special casing for ZWJ/ZWNJ that I haven't accounted for so I'm hoping that pops up in some of the tests. There were also comments about tricky bits with some Arabic ligatures, but my ligature handling code is quite different from what HarfBuzz does so I'm curious to see if it catches the edge cases. We'll see how it fares.

kirawi commented 2 years ago

@dfrg I've begun researching what I need to create the tests, and I've hit an obstacle with figuring out how to get the glyph name from a GlyphId in swash. Is there a method or function somewhere I missed?

Edit: The more I look into it, it seems like that's being deferred to a higher-level API? I might have to figure out how to do it then, thankfully Microsoft's documentation is really good.

dfrg commented 2 years ago

Good catch. I didn't want to provide glyph names as public API because they're not useful in the context of font rendering, but I'll throw together some quick code to parse them from the post table and expose it under some doc(hidden) function to be used for the tests.

dfrg commented 2 years ago

Realized I had written this code in an earlier rev, so just copied it over and pushed to master. FontRef now has a hidden glyph_name function that takes a GlyphId and returns Option<&'a str>. I believe HarfBuzz uses gid<id> for missing names.

kirawi commented 2 years ago

Awesome!

kirawi commented 2 years ago

Progress update: I've begun working out the beginnings of the generate_tests tool. I hope (fingers crossed) that I will be able to get text-rendering-tests complete tomorrow.

@dfrg I don't quite understand the output format used in this test, any idea as to what they could mean?: https://github.com/harfbuzz/harfbuzz/blob/main/test/shaping/data/text-rendering-tests/tests/MORX-11.tests [B|A@650,0|B@1288,0|B@1938,0|A@2588,0|X@3226,0|A@3812,0|B@4450,0]

Edit: Nevermind, I believe I found what it means: https://github.com/RazrFalcon/rustybuzz/blob/8a138b1843dcb967b785510e80dc38c4d6c319cf/src/buffer.rs#L1702

dfrg commented 2 years ago

Just a note that HarfBuzz seems to be accumulating advances into offsets here. I'm not sure if that is done for all of the tests.

I hacked together a small harness for this and an initial run through the shaper yields: [B@0,0+650|A@0,0+638|B@0,0+650|B@0,0+650|A@0,0+638|X@0,0+586|A@0,0+638|B@0,0+650]

And if I accumulate advances: [B|A@650,0|B@1288,0|B@1938,0|A@2588,0|X@3226,0|A@3812,0|B@4450,0]

Which matches the expected output.

The source text here is "BABBAABX" and the resulting glyphs are "BABBAXAB". This is a fairly odd use of the rearrangement mechanism in the AAT extended metamorphosis table that revealed two bugs. Both are "fixed" but might need some more work if there are real fonts that do this sort of thing. Notably, this font probably breaks text selection on macOS. Looks like we're in for a bumpy ride :)

sirdody commented 2 years ago

Hi Kirawi,

I am interested in testing swash, what's about your testing tool? If I can help you anything, please let me know.

Thanks,

kirawi commented 2 years ago

It was already merged #19, but @dfrg is continuing to work on it I believe.