Open RuixiZhang42 opened 3 years ago
BTW, Simplified Chinese (ZHS
) should not use cid1644
(full-width) to render U+00B7. According to GB/T 15834-2011, U+00B7 is recommended to be used as the “separator mark” (§ 4.14 and ¶ 4.14.3.5) and it should be half-width (¶ 5.1.7). However, there are some caveats:
In view of these caveats, Source Han Serif SC actually maps a proportional glyph to U+00B7. Perhaps Source Han Sans SC can do the same, i.e.,
For UniSourceHanSansCN-UTF32-H, merge the following three lines:
line 68: <000000b7> 1644
line 11825: <000000ae> <000000b6> 108
line 11826: <000000b8> <000000ff> 118
into one line in the 100 begincidrange block:
<000000ae> <000000ff> 108 %% This makes cid117 maps to U+00B7 for Simplified Chinese,
%% so that Source Han Sans SC behaves the same as Source Han Serif SC.
In Taiwan and Hong Kong, U+00B7 is usually full-width. But I can’t find official standards that require it to be full-width.
AFAIK, Taiwan and Hong Kong users prefer U+2027 as the “separator mark”, but they occasionally will use U+30FB too. Japanese texts use U+30FB as the “separator mark”.
In any case, U+30FB and U+2027 should stay full-width when switching language tags.
Japanese and Korean texts don’t use U+2022 as the “separator mark”, and thus it makes sense to keep this character proportional for JAN
and KOR
.
I’m not aware of official standards from mainland China, Taiwan, or Hong Kong that require U+2022 to be full-width. But users from these regions may have expectations that this character should be full-width, because of decades of exposures to local foundries practice.
Pictures worth a thousand words:
This could well be a systematic error, which could potentially affect many more code points beyond just U+00B7, U+2022, U+2027, and U+30FB.
Well… You know there’s a saying “mathematicians love to generalize things”? So… Here is a sufficient condition for this bug to appear with other code points:
Prerequisites
Description
Source Han Serif mirror issue: https://github.com/adobe-fonts/source-han-serif/issues/93
Using language-specific OTFs (with full 65535 glyphs support, not the subset OTFs), the character
・
(U+30FB, Katakana Middle Dot) is sometimes rendered as proportional-width, but it should always stay full-width. Here are the steps to reproduce this bug:・
into whatever layout program.ZHS
,ZHT
, orZHH
(under any scriptlatn
,grek
,cyrl
,kana
,hang
, orhani
), or just use the font’s default script and language. The character・
is rendered as full-width.JAN
orKOR
, then the character・
becomes proportional-width.Similarly, the character
‧
(U+2027, Hyphenation Point) has the exact same problems.Bug analysis
By default, to render either U+30FB or U+2027, the 5 OTFs (SC, TC, HC, J, and K) all use
cid1644
(full-width).To render
·
(U+00B7, Middle Dot), SC, TC, and HC still usecid1644
(full-width). However, J and K usecid117
(proportional-width).To render
•
(U+2022, Bullet), SC, TC, and HC still usecid1644
(full-width). However, J and K usecid733
(proportional-width, but a different one).The lookup tables
cn2jp
,cn2kr
,tw2jp
,tw2kr
,hk2jp
, andhk2kr
all contain the following line:So this is the source of the problem:
•
(U+2022), we want it to be full-width inZHS
,ZHT
, andZHH
, but we want it to be proportional-width inJAN
andKOR
.・
(U+30FB) or‧
(U+2027), the substitution to proportional-width still happens in SC, TC, and HC, although either glyph should stay full-width.·
(U+00B7) using the SC, TC, or HC font, but language-tagged the character withJAN
orKOR
, the result iscid733
. But using the J or K font, the result iscid117
. The two should all becid117
.