U+30FB and U+2027 are not full-width

Prerequisites

[x] If you are reporting an issue that affects glyphs for characters for a particular region or regions, did you verify that the characters are within the supported scope of the region or regions? This generally means GB 18030 or Tōngyòng Guīfàn Hànzìbiǎo (通用规范汉字表) for China, Big Five or CNS 11643 Planes 1 & 2 for Taiwan, HKSCS-2016 for Hong Kong, JIS X 0208, JIS X 0212, and JIS X 0213 for Japan, and KS X 1001 and KS X 1002 for Korea.
[x] Did you thoroughly search the open and closed issues to avoid reporting a duplicate issue?
[x] Did you go through the official font readme file to better understand the scope of the project, to include the Known Issues section at the very end?

Description

Source Han Serif mirror issue: https://github.com/adobe-fonts/source-han-serif/issues/93

Using language-specific OTFs (with full 65535 glyphs support, not the subset OTFs), the character ・ (U+30FB, Katakana Middle Dot) is sometimes rendered as proportional-width, but it should always stay full-width. Here are the steps to reproduce this bug:

Type ・ into whatever layout program.
Use any one of the 3 Chinese-oriented (SC, TC, or HC) OTFs to render this character.
Either language-tag with ZHS, ZHT, or ZHH (under any script latn, grek, cyrl, kana, hang, or hani), or just use the font’s default script and language. The character ・ is rendered as full-width.
Switch language tag to JAN or KOR, then the character ・ becomes proportional-width.
But using J or K versions of the OTFs, U+30FB will stay full-width.

Similarly, the character ‧ (U+2027, Hyphenation Point) has the exact same problems.

Bug analysis

By default, to render either U+30FB or U+2027, the 5 OTFs (SC, TC, HC, J, and K) all use cid1644 (full-width).

To render · (U+00B7, Middle Dot), SC, TC, and HC still use cid1644 (full-width). However, J and K use cid117 (proportional-width).

To render • (U+2022, Bullet), SC, TC, and HC still use cid1644 (full-width). However, J and K use cid733 (proportional-width, but a different one).

The lookup tables cn2jp, cn2kr, tw2jp, tw2kr, hk2jp, and hk2kr all contain the following line:

  substitute \1644 by \733;

So this is the source of the problem:

This substitution is needed, because when a user types • (U+2022), we want it to be full-width in ZHS, ZHT, and ZHH, but we want it to be proportional-width in JAN and KOR.
But this simple substitution carries two problems:
1. When a user types either ・ (U+30FB) or ‧ (U+2027), the substitution to proportional-width still happens in SC, TC, and HC, although either glyph should stay full-width.
2. When a user types · (U+00B7) using the SC, TC, or HC font, but language-tagged the character with JAN or KOR, the result is cid733. But using the J or K font, the result is cid117. The two should all be cid117.

About U+00B7

BTW, Simplified Chinese (ZHS) should not use cid1644 (full-width) to render U+00B7. According to GB/T 15834-2011, U+00B7 is recommended to be used as the “separator mark” (§ 4.14 and ¶ 4.14.3.5) and it should be half-width (¶ 5.1.7). However, there are some caveats:

All major foundries in mainland China (Founder Type, Hanyi, etc.) do not follow GB/T 15834-2011. They make full-width U+00B7, likely for backward compatibility to Founder’s own layout software.
GB/T 15834-2011 contradicts itself by using full-width U+00B7 everywhere in § 4.14.
Also, the so-called “半角” (“half-width”) can sometimes be interpreted as “proportional-width”. For example, when GB/T says “半角数字” (“half-width figures”) it doesn’t always mean “the figures must be exactly half-width”. Sometimes it just means “single-byte figures that are encoded in the ASCII range, not double-byte figures that are encoded in the Halfwidth and Fullwidth Forms Unicode block”.

In view of these caveats, Source Han Serif SC actually maps a proportional glyph to U+00B7. Perhaps Source Han Sans SC can do the same, i.e.,

For UniSourceHanSansCN-UTF32-H, merge the following three lines:

line    68: <000000b7> 1644
line 11825: <000000ae> <000000b6> 108
line 11826: <000000b8> <000000ff> 118

into one line in the 100 begincidrange block:

<000000ae> <000000ff> 108 %% This makes cid117 maps to U+00B7 for Simplified Chinese,
                          %% so that Source Han Sans SC behaves the same as Source Han Serif SC.

In Taiwan and Hong Kong, U+00B7 is usually full-width. But I can’t find official standards that require it to be full-width.

About U+30FB and U+2027

AFAIK, Taiwan and Hong Kong users prefer U+2027 as the “separator mark”, but they occasionally will use U+30FB too. Japanese texts use U+30FB as the “separator mark”.

In any case, U+30FB and U+2027 should stay full-width when switching language tags.

About U+2022

Japanese and Korean texts don’t use U+2022 as the “separator mark”, and thus it makes sense to keep this character proportional for JAN and KOR.

I’m not aware of official standards from mainland China, Taiwan, or Hong Kong that require U+2022 to be full-width. But users from these regions may have expectations that this character should be full-width, because of decades of exposures to local foundries practice.

Pictures worth a thousand words:

This could well be a systematic error, which could potentially affect many more code points beyond just U+00B7, U+2022, U+2027, and U+30FB.

Well… You know there’s a saying “mathematicians love to generalize things”? So… Here is a sufficient condition for this bug to appear with other code points:

adobe-fonts / source-han-sans