Open dominikh opened 3 months ago
(It's probably also worth documenting that ResolveFaceForLang just maps languages to rune sets and uses those for the lookup; it doesn't consult the metadata of the fonts. The ideal combination of resolving by rune and face should probably also consult the LOCL
table; though really, segmenting by face should use grapheme clusters, not individual runes, and also handle Unicode normalization, etc.)
Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ?
Perhaps this issue would be resolved by rules (like the ones used by fontconfig) such as "for given language and family, use this family instead of that one" (related to #82).
Its true that the segmentation process is limited, because we rely on Harfbuzz for normalization and cluster handling, which is a rather complex topic. I'm not sure how hard it would be to extract the Harfbuzz logic and apply it during segmentation..
Hum.. I was not aware the same Unicode code point may have different glyphs presentation depending on the language. Have you some examples of fonts and languages that have this behavior ?
The most famous example is https://en.wikipedia.org/wiki/Han_unification. It also sometimes happens for different languages using Cyrillic. For Han, if you're not using a pan-CJK font like Noto Sans CJK, you will have different fonts for Japanese Kanji and Chinese Han. There are even regional differences, with mainland China, Taiwan, Hong Kong, and Singapore all having slight regional differences for the same code points.
ResolveFace
returns the first face that covers a given rune, whileResolveFaceForLang
returns the first face that covers a given language. But how do I find the first face that covers a given rune in a given language?For example, we might have two fonts
cn0-4
andcn4-8
that cover disjoint sets of runes for Traditional Chinese, and two fontsjp0-4
andjp4-8
that cover the same runes as the Chinese fonts, but for Japanese, registered in the ordercn0-4
,cn4-8
,jp0-4
,jp4-8
.I cannot just look for "rune 5", nor for "japanese" to find
jp4-8
. The first search would findcn4-8
, and the second search would findjp0-4
.This also impacts
shaping.SplitByFace
, which currently discards language information.