RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
498 stars 34 forks source link

Sync with 3.0.0. #92

Closed LaurenzV closed 4 months ago

LaurenzV commented 4 months ago

My first attempt. let's see how it goes.

Legend: 🟢: Port of commit succeeded ⚪️: Commit is not relevant 🟡: I have uncertainties about that commit 🔴: Port of commit is failing ⚫️: Commit has not been ported yet.

Table: Status Commit message HB Link
⚪️ Typo Link
⚪️ [ci] Use different executer for win64 jobs Link
⚪️ [docs] Fix subset documentation Link
⚪️ [subset] Improve alignment of --help-all message Link
⚪️ [subset] fox for (Chain)ContextFormat3: subset lookupRecord Link
⚪️ [style] Fall back to GPOS 'size' feature for optical size in style API Link
⚪️ [style] Rename HB_STYLE_TAG_SLANT to HB_STYLE_TAG_SLANT_ANGLE Link
⚪️ [style] Add HB_STYLE_TAG_SLANT_RATIO Link
⚪️ Fix build with -Dexperimental_api=true Link
⚪️ [style] Prepare for release Link
⚪️ [style] Fix documentation Link
⚪️ [style] Try to fix MSVC build Link
⚪️ [style] Try harder to fix MSVC build Link
⚪️ [style] Change private slant tag Link
⚪️ [style] Fix float conversion warning Link
⚪️ Call atexit() via hb_atexit() Link
🟢 [ot-shape] Enabled two more features: 'Harf' and 'Buzz' Link
⚪️ [style] Fix another float conversion warning Link
⚪️ Add fallback atexit implementation using template descrutors Link
⚪️ [util/shape-consumer] Simplify font tracking Link
⚪️ [util] Fix leak of an hb_font_t Link
🟢 [Unicode 14] Add new `hb_script_t` values Link
⚪️ [Unicode 14] Update the UCD table Link
🟢 [Unicode 14] Update the script direction list Link
🟢 [Unicode 14] Update the Arabic table Link
🟢 [Unicode 14] Update the Indic table Link
🟢 [Unicode 14] Update the vowel constraint table Link
⚪️ [Unicode 14] Update the emoji table Link
🟢 [Unicode 14] Update the USE table Link
🟢 [Unicode 14] Send all the new scripts to USE Link
⚪️ [Unicode 14] Add tests Link
🟢 [Unicode 14] Update the Arabic joining script list Link
⚪️ Merge pull request #3206 from harfbuzz/unicode-14 Link
⚪️ [meson] Require ragel 6.10 Link
⚪️ [ragel] Regenerate state machine files with 6.10 Link
⚪️ [meson] Add ragel subproject Link
🟢 [USE] Send Khitan Small Script and Yezidi to USE Link
⚪️ [style] Change tag type to hb_style_tag_t Link
⚪️ CMake: Detect pthread with module FindThread Link
⚪️ [subset] In preperation for 3.0.0 release remove legacy subsetting api methods. Link
⚪️ [subset] restore hb_subset_input_unicode/glyph_set () to documentation. Link
⚪️ Merge pull request #3210 from NikitaFeodonit/main Link
⚪️ [meson] Add ragel_subproject option Link
⚪️ [ci] Build ragel on the Linux CI job Link
⚪️ [ci] Ragel subproject requires meson 0.55.0 Link
⚪️ [ci] meson 0.55.0 didn‚Äôt work for harfbuzz Link
⚪️ Centralize math include Link
⚪️ 3.0.0 Link
LaurenzV commented 4 months ago

So as far as I can tell arabic tables and ucd tables are all covered by gen-universal-table.py? So for https://github.com/harfbuzz/harfbuzz/commit/18a46eeaf and https://github.com/harfbuzz/harfbuzz/commit/26e1498e7 it should be fine to just update that script to 14.0?

EDIT: Nevermind, seems like the corresponding Python file in harfbuzz is gen-use-table... So I'm not sure how to deal with this commit?

LaurenzV commented 4 months ago

And it seems like we have gen-unicode-is-emoji-ext-pict.py for emojis but no corresponding .rs file? How do I deal with that?

RazrFalcon commented 4 months ago

Wow, you have put a lot of effort into it. Much appreciated!

[style] Fall back to GPOS 'size' feature for optical size in style API

Yes, we don't have such API.

[util/shape-consumer] Simplify font tracking

Yes, unrelated.

[Unicode 14] Add tests

Yes, we don't have those.

[meson] Require ragel 6.10

Yes, we use master branch already.

So as far as I can tell arabic tables and ucd tables are all covered by gen-universal-table.py?

Afair, hb-ot-shape-complex-arabic-table.hh was ported by hand. At least there are no script for it and the file doesn't start with the this file was generated comment. So you have to update it manually (which is very easy). Or port harfbuzz's script if you feel like it (harder).

As for hb-ucd-table.hh, we don't have it at all. We either use 3rd-party dependencies or our own scripts. Therefore updating to 14.0.0 in Python scripts + updating unicode-* crates should be enough.

And it seems like we have gen-unicode-is-emoji-ext-pict.py for emojis but no corresponding .rs file? How do I deal with that?

src/unicode.rs:is_emoji_extended_pictographic I will update docs.

LaurenzV commented 4 months ago

Gotcha!

RazrFalcon commented 4 months ago

We also have src/unicode.rs:check_unicode_version tests, in case you haven't saw those. They all should be 14.0 now, I guess.

LaurenzV commented 4 months ago

Ah, I guess we need to update the dependencies too, then.

LaurenzV commented 4 months ago

Should I submit PR to those (unicode-cc and unicode-bidi)? Or is it faster if you just do it when you have time?

LaurenzV commented 4 months ago
#[derive(Clone, Copy, PartialEq, Debug)]
pub enum JoiningType {
    U = 0,
    L = 1,
    R = 2,
    D = 3,
    // We don't have C, like harfbuzz, because Rust doesn't allow duplicated enum variants.
    GroupAlaph = 4,
    GroupDalathRish = 5,
    T = 7,
    X = 8, // means: use general-category to choose between U or T.
}

Why would adding C here make it duplicated?

RazrFalcon commented 4 months ago

I will update my unicode crates soon.

Why would adding C here make it duplicated?

Because Rust doesn't allow enums to have duplicate values. The C++ code is:

enum hb_arabic_joining_type_t {
  JOINING_TYPE_D        = 3,
  JOINING_TYPE_C        = JOINING_TYPE_D,
}
LaurenzV commented 4 months ago

Right, so I always need to use D instead of C.

RazrFalcon commented 4 months ago

Yep.

RazrFalcon commented 4 months ago

Updated my unicode-* crates. Use version 0.2 for both.

LaurenzV commented 4 months ago

Okay, just the emoji commit is missing now: https://github.com/harfbuzz/harfbuzz/commit/35ac0de42

I'm unsure what to do with it though. They basically add this file: https://www.unicode.org/Public/emoji/latest/emoji-test.txt

But it seems that this file doesn't contain any information on extended pictographics. But it seems we only use the file to generate that one method. Does this mean we can ignore this file?

RazrFalcon commented 4 months ago

Yes, I think we can ignore that emoji change. It doesn't seems to affect us. harfbuzz is not just a shaper, but rather a font processing toolkit. It has far more features that we do not implement in rustybuzz.

The rule of thumb: if tests are passing - we're fine.

LaurenzV commented 4 months ago

I guess porting itself is not as hard as I thought, just time-consuming. But I will probably change my mind once I encounter my first not-passing test and spend hours with the debugger. 😄

RazrFalcon commented 4 months ago

Pretty much.

khaledhosny commented 4 months ago

Okay, just the emoji commit is missing now: harfbuzz/harfbuzz@35ac0de42

I'm unsure what to do with it though. They basically add this file: https://www.unicode.org/Public/emoji/latest/emoji-test.txt

But it seems that this file doesn't contain any information on extended pictographics. But it seems we only use the file to generate that one method. Does this mean we can ignore this file?

This file is used to generate emoji test data (IIRC, to test that all emoji sequences are shaped as one cluster)