RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
498 stars 34 forks source link

Sync to 5.3.1 #109

Closed LaurenzV closed 2 weeks ago

LaurenzV commented 2 weeks ago

The orange ones are the ones I'm unsure about.

Status Commit message HB
🟢 [tt-font] Apply VVAR.vOrg variation to VORG origin Link Implemented in ttf-parser. I manually verified that https://github.com/harfbuzz/harfbuzz/issues/1694 is now resolved for us as well.
🟡 [glyf] Change side-bearing rounding Link not relevant I think? since we don't support phantom points yet
🟡 Fix LookupFlag negation Link I believe this shoulddnt affect us?
🟡 [use-table] Port to using packtab Link Unfortunately packtab seems to generate c++ code (and it's also been updated in the meanwhile)... so for now I've decided to switch to a manual implementation, hopegully when the port is done we can port the packTab code to work with rust.
🟢 [GPOS] Adjust mark attachment on multiple substitution some more Link
🟡 [reorg] Move Coverage Link ttf-parser stuff I guess?
🟢 [arabic] Pause after calt only if no rclt Link
🟡 [aat/morx] Implement language-specific forms Link Where exactly can I find this ltag in our port?
🟡 Add hb_language_matches() Link
🟢 [arabic] Make more features F_MANUAL_ZWJ Link
🟢 [GPOS/Cursive] Fix breaking of parent-child attachment Link
🟢 [hebrew] Implement Jerusalem mark reordering Link
🟢 [hebrew] Comment Link
🟢 Add tests for multiple cursive positioning Link
🟢 [hebrew] Break out of reordering loop when pattern found Link
🟢 [GPOS/CursivePos] Fix unsafe-to-break marking Link
🟡 [layout] Reduce number of closure rounds Link we dont have this right?
🟢 [buffer] Add sync_so_far() Link
🟢 [buffer] Return delta from sync_so_far Link
🟢 [use-table] Regenerate with latest packtab Link
🟡 [emoji] Regenerate test data Link we dont have this right?
🟡 [emoji] Bump compression level Link
🟢 Prototype glyph flag safe-to-kashida Link
🟢 Document kashida stuff Link
🟢 Rename kashida -> tatweel Link
🟢 [shape] Only add any unsafe-to-concat flag if requested Link
🟢 [Unicode 15] Update the Arabic table Link
🟢 [Unicode 15] Update the Indic table Link
🟢 [Unicode 15] Update the vowel constraint table Link
🟢 [Unicode 15] Update the emoji table & cluster test Link
🟢 [Unicode 15] Update the USE table Link
🟢 [Unicode 15] Send the new scripts to USE Link
🟢 [Unicode 15] Update the Arabic joining script list Link
🟢 [USE] Allow ZWNJ at the end of a cluster Link
🟢 [aat] Map 'rlig' to AAT required-ligatures setting. Link
🟢 Update the language system tag registry URL Link
🟢 [USE] Update the data files Link
🟢 [shape] Adjust Grapheme clusters for Katakana voiced sound marks Link
RazrFalcon commented 2 weeks ago

[glyf] Change side-bearing rounding | not relevant I think? since we don't support phantom points yet

Yep.

Fix LookupFlag negation I believe this shoulddnt affect us?

We're already doing the right thing:

iter.set_lookup_props(ctx.lookup_props & !u32::from(lookup_flags::IGNORE_FLAGS));

All hail to Rust!

[use-table] Port to using packtab |Unfortunately packtab seems to generate c++ code so for now I've decided to switch to a manual implementation

What do you mean by that? Have you ported C/C++ code by hand or wrote a custom implementation to begin with?

[reorg] Move Coverage | ttf-parser stuff I guess?

Doesn't seems to affects us since this is part of the boring expansion.

[aat/morx] Implement language-specific forms | Where exactly can I find this ltag in our port?

You mean this?

Reduce number of closure rounds | we dont have this right?

Yes, doesn't seems like we do.

Regenerate test data | we dont have this right?

Yep, not testing this part for now.

LaurenzV commented 2 weeks ago

What do you mean by that? Have you ported C/C++ code by hand or wrote a custom implementation to begin with?

Oh, no as mentioned in the last PR it just means that I ported the C/C++ code instead of relying on the gen-use-table.py script. But it works fine for now.

You mean this?

I don't think it's the same, I'm talking about the map->face->table.ltag->get_language, where do we have those language tags?

LaurenzV commented 2 weeks ago

Oh okay, I realized those two unicode crates are by you, do you think it would be possible for you to update them to 15.0?

RazrFalcon commented 2 weeks ago

What crates are we talking about? I'm missing the context here.

LaurenzV commented 2 weeks ago

See this comment: https://github.com/RazrFalcon/rustybuzz/pull/109#discussion_r1641996933

So unicode_bidi_mirroring and unicode_ccc.

RazrFalcon commented 2 weeks ago

Missed this one. Will do.

RazrFalcon commented 2 weeks ago

Done.

LaurenzV commented 2 weeks ago

Okay. All that's missing now is still this commit: https://github.com/harfbuzz/harfbuzz/commit/2124ad890

I'm not sure how to translate map->face->table.ltag->get_language (setting - 1) == map->props.language).

RazrFalcon commented 2 weeks ago

Looks like we have to implement https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6ltag.html Bear in mind that we do not test macOS fonts for now...

LaurenzV commented 2 weeks ago

Oh no... So we add a TODO for now? Or should I look into it?

RazrFalcon commented 2 weeks ago

I think we can skip it, since it's pretty niche. But open an issue in ttf-parser and here so we would not forget to add it later.

behdad commented 2 weeks ago

@LaurenzV Can you try the Rust output from https://github.com/harfbuzz/packtab

In the HB scripts, instead of print_c, call print_code and pass language='rust' to it.

My Rust is rusty! I'm sure I've made mistakes.

Thanks.

LaurenzV commented 2 weeks ago

Will try!

LaurenzV commented 2 weeks ago

Just a couple of small issues from what I can tell:

What is generated:

const hb_use_u8: [stdint::uint8_t; 3187]
[...

What should be generated:

const hb_use_u8: [u8; 3187] =
[

Same for u16, which should be u16 instead of stdint::uint16_t.

For hb_use_b4 this is generated:

fn hb_use_b4 (a: stdint::&[uint8_t], i: stdint::unsigned) -> stdint::unsigned
{
  return (a[i>>1]>>((i&1u)<<2))&15u;
}

while it should be (needs a couple of casts unfortunately...), and in Rust you generally don't have an explicit return statement, but it doesn't really matter:

fn hb_use_b4(a: &[u8], i: u32) -> u32 {
    (a[i as usize>>1]>>((i&1u32)<<2)) as u32 & 15u32
}

and finally, for hb_use_get_category: What is generated:

fn hb_use_get_category (u: stdint::unsigned) -> stdint::uint_fast8_t
{
  return u<921600u?hb_use_u8[2809+(((hb_use_u8[593+(((hb_use_u16[((hb_use_u8[113+(((hb_use_b4(hb_use_u8,u>>1>>3>>3>>5))<<5)+((u>>1>>3>>3)&31u))])<<3)+((u>>1>>3)&7u)])<<3)+((u>>1)&7u))])<<1)+((u)&1u))]:O;
}

What it should look like:

pub(crate) fn hb_use_get_category(u: u32) -> u8 {
    if u<921600 {
        hb_use_u8[2777+(((hb_use_u8[593+(((hb_use_u16[((hb_use_u8[113+(((hb_use_b4(&hb_use_u8, u>>1>>3>>3>>5) as usize)<<5)+((u as usize>>1>>3>>3)&31usize))] as usize)<<3)+((u as usize>>1>>3)&7usize)])<<3) as usize +((u as usize>>1)&7usize))] as usize)<<1)+((u as usize)&1usize))]
    } else {
        O
    }
}

In particular:

Guess there is still a lot of differences between Rust and C++ after all. 😅

behdad commented 2 weeks ago

Just a couple of small issues from what I can tell:

Thanks. I'll get back to it in a few days.

Can I not use the stdint crate there? I like it for it's int8_fast_t etc.

LaurenzV commented 2 weeks ago

No rush, I can't really use the script anyway right now because we haven't reached the commit yet where it's updated to 15.1 (the newest), so the data wouldn't match if I generate it with the newest version now. Thanks a lot for working on it!

Can I not use the stdint crate there? I like it for it's int8_fast_t etc.

You'd have to ask @RazrFalcon, but I think he prefers to not add more dependencies if not strictly necessary.

@RazrFalcon I've added the TODO, so I think it should be ready to merge!

RazrFalcon commented 2 weeks ago

Can I not use the stdint crate there? I like it for it's int8_fast_t etc.

I don't see a reason to. Performance gains will be neglectable, if any.

behdad commented 2 weeks ago

Can I not use the stdint crate there? I like it for it's int8_fast_t etc.

I don't see a reason to. Performance gains will be neglectable, if any.

Fair enough. Thanks.