Closed LaurenzV closed 4 months ago
So as far as I can tell arabic tables and ucd tables are all covered by gen-universal-table.py
? So for https://github.com/harfbuzz/harfbuzz/commit/18a46eeaf and https://github.com/harfbuzz/harfbuzz/commit/26e1498e7 it should be fine to just update that script to 14.0?
EDIT: Nevermind, seems like the corresponding Python file in harfbuzz is gen-use-table
... So I'm not sure how to deal with this commit?
And it seems like we have gen-unicode-is-emoji-ext-pict.py
for emojis but no corresponding .rs
file? How do I deal with that?
Wow, you have put a lot of effort into it. Much appreciated!
[style] Fall back to GPOS 'size' feature for optical size in style API
Yes, we don't have such API.
[util/shape-consumer] Simplify font tracking
Yes, unrelated.
[Unicode 14] Add tests
Yes, we don't have those.
[meson] Require ragel 6.10
Yes, we use master branch already.
So as far as I can tell arabic tables and ucd tables are all covered by gen-universal-table.py?
Afair, hb-ot-shape-complex-arabic-table.hh
was ported by hand. At least there are no script for it and the file doesn't start with the this file was generated
comment.
So you have to update it manually (which is very easy). Or port harfbuzz's script if you feel like it (harder).
As for hb-ucd-table.hh
, we don't have it at all. We either use 3rd-party dependencies or our own scripts. Therefore updating to 14.0.0
in Python scripts + updating unicode-*
crates should be enough.
And it seems like we have gen-unicode-is-emoji-ext-pict.py for emojis but no corresponding .rs file? How do I deal with that?
src/unicode.rs:is_emoji_extended_pictographic
I will update docs.
Gotcha!
We also have src/unicode.rs:check_unicode_version
tests, in case you haven't saw those. They all should be 14.0 now, I guess.
Ah, I guess we need to update the dependencies too, then.
Should I submit PR to those (unicode-cc and unicode-bidi)? Or is it faster if you just do it when you have time?
#[derive(Clone, Copy, PartialEq, Debug)]
pub enum JoiningType {
U = 0,
L = 1,
R = 2,
D = 3,
// We don't have C, like harfbuzz, because Rust doesn't allow duplicated enum variants.
GroupAlaph = 4,
GroupDalathRish = 5,
T = 7,
X = 8, // means: use general-category to choose between U or T.
}
Why would adding C here make it duplicated?
I will update my unicode crates soon.
Why would adding C here make it duplicated?
Because Rust doesn't allow enums to have duplicate values. The C++ code is:
enum hb_arabic_joining_type_t {
JOINING_TYPE_D = 3,
JOINING_TYPE_C = JOINING_TYPE_D,
}
Right, so I always need to use D instead of C.
Yep.
Updated my unicode-*
crates. Use version 0.2 for both.
Okay, just the emoji commit is missing now: https://github.com/harfbuzz/harfbuzz/commit/35ac0de42
I'm unsure what to do with it though. They basically add this file: https://www.unicode.org/Public/emoji/latest/emoji-test.txt
But it seems that this file doesn't contain any information on extended pictographics. But it seems we only use the file to generate that one method. Does this mean we can ignore this file?
Yes, I think we can ignore that emoji change. It doesn't seems to affect us. harfbuzz is not just a shaper, but rather a font processing toolkit. It has far more features that we do not implement in rustybuzz.
The rule of thumb: if tests are passing - we're fine.
I guess porting itself is not as hard as I thought, just time-consuming. But I will probably change my mind once I encounter my first not-passing test and spend hours with the debugger. 😄
Pretty much.
Okay, just the emoji commit is missing now: harfbuzz/harfbuzz@35ac0de42
I'm unsure what to do with it though. They basically add this file: https://www.unicode.org/Public/emoji/latest/emoji-test.txt
But it seems that this file doesn't contain any information on extended pictographics. But it seems we only use the file to generate that one method. Does this mean we can ignore this file?
This file is used to generate emoji test data (IIRC, to test that all emoji sequences are shaped as one cluster)
My first attempt. let's see how it goes.
Legend: 🟢: Port of commit succeeded ⚪️: Commit is not relevant 🟡: I have uncertainties about that commit 🔴: Port of commit is failing ⚫️: Commit has not been ported yet.