Consolidation of Mapping Change Suggestions

kenlunde commented 7 years ago

This issue is meant for tracking and submitting suggestions for mapping changes, meaning that a character might be better mapped to a different but existing glyph. Note that mapping changes, especially for ideographs, will trigger changes to GSUB features, such as the language-specific lookups of the 'locl' GSUB feature. Because tools are used to build the language-specific lookups of the 'locl' GSUB feature by using the CMap resources, such suggestions cannot be accepted as pull requests, and should instead be posted here. Issues that were submitted before this consolidation issue was opened are referenced by issue number.

The following changes were made in Version 1.001:

Mapped U+3164 HANGUL FILLER to uni1160.
Mapped U+2D544 to uni2E8D-JP.
Mapped U+2EC1 ⻁, U+2EEA ⻪, and U+2F2C ⼬ to uni864EuE0101-JP, uni9EFE-CN, and uni5C6E-CN, respectively.
Mapped U+2F22 ⼢ and U+2F58 ⽘ to uni590A-CN and uni723B-CN, respectively, in the CN CMap resource.
Mapped U+5173 关 to uni5173-CN in the KR CMap resource per Issue #5.
Mapped U+5553 啓 and U+555F 啟 to uni5553uE0101-JP and uni555F-JP, respectively, in the TW CMap resource per Issue #13.
Mapped U+5BE7 寧 to uni5BE7uE0100-JP in the KR CMap resource per Issue #20.
Mapped U+58F3 壳 to uni58F3-JP in the TW CMap resource per Issue #26.
Mapped U+58FE 壾 and U+591A 多 to uni58FE-JP and uni591A-JP, respectively, in the TW CMap resource per Issue #27.
Mapped U+4F8D 侍, U+6641 晁, and U+6C35 氵 to uni4F8D-JP, uni6641-JP, and uni6C35-JP, respectively, in the CN CMap resource per Issue #28.
Mapped U+6902 椂, U+6903 椃, U+6947 楇, U+7171 煱, and U+9BF1 鯱 to uni6902-JP, uni6903-JP, uni6947-JP, uni7171-JP, and uni9BF1-JP, respectively, in the TW CMap resource per Issue #32.
Mapped U+4EBD 亽 to uni4EBD-CN in the JP—and by extension KR—CMap resource per Issue #34.
Mapped U+627F 承 and U+77A2 瞢 to uni627F-JP and uni77A2uE0101-JP, respectively, in the TW CMap resource per Issue #36.
Mapped U+62FF 拿, U+6301 持, U+6DE6 淦, U+6DFC 淼, U+6EB4 溴, and U+81EC 臬 to uni62FF-JP, uni6301-JP, uni6DE6-JP, uni6DFC-JP, uni6EB4-JP, and uni81EC-JP, respectively, in the CN CMap resource per Issue #38.
Mapped U+504F 偏 to uni504FuE0101-JP in the TW CMap resource per Issue #38.
Mapped U+76EC 盬 and U+8B04 謄 to uni76EC-CN and uni8B04-CN, respectively, in the TW CMap resource per Issue #39.
Mapped U+2F61 ⽡ to uni74E6-JP in the TW CMap resource per Issue #43.
Mapped U+61DC 懜, U+77D2 矒, U+8019 耙, and U+803B 耻 to uni61DC-JP, uni77D2-JP, uni8019-JP, and uni803B-JP, respectively, in the TW CMap resource.
Mapped U+2FCC ⿌ to uni9EFD-JP in the TW CMap resource.

Post Version 1.001 Mapping Changes:

Map U+732A 猪 to uni732A-JP in the CN CMap resource per Issue #38.
Map U+5009 倉 to uni5009-JP in the TW CMap resource per Issue #53.
Map U+2EDE ⻞ to u2967F-CN in the CN CMap resource per Issue #55.
Map U+7300 猀 to uni7300-CN in the JP and KR CMap resources per Issue #59.
Map U+526A 剪, U+5881 墁, U+688F 梏, U+6ADD 櫝, U+6C4B 汋, U+7006 瀆, U+70B7 炷, U+7258 牘, U+72A2 犢, U+72B3 犳, and U+7431 琱 to uni526A-CN, uni5881-CN, uni688F-CN, uni6ADD-CN, uni6C4B-CN, uni7006-CN, uni70B7-CN, uni7258-CN, uni72A2-CN, uni72B3-CN, and uni7431-CN, respectively, in the KR CMap resource per Issue #59.
Map U+501C 倜, U+5192 冒, U+52C7 勇, U+553E 唾, U+5DFD 巽, U+641C 搜, U+73F9 珹, U+7A20 稠, U+7C3F 簿, U+8983 覃, and U+8D16 贖 to uni501C-CN, uni5192-CN, uni52C7-CN, uni553E-CN, uni5DFD-CN, uni641C-CN, uni73F9-CN, uni7A20-CN, uni7C3F-CN, uni8983-CN, and uni8D16-CN, respectively, in the KR CMap resource per Issue #60.
Map U+4E7C 乼, U+5125 儥, U+58B0 墰, U+60C6 惆, U+6D2C 洬, U+6E54 湔, U+83C2 菂, U+83DF 菟, U+86C0 蛀, U+8729 蜩, U+8CD9 賙, U+90DC 郜, U+99B0 馰, U+9C4F 鱏, U+9D69 鵩, and U+9EF7 黷 to uni4E7C-CN, uni5125-CN, uni58B0-CN, uni60C6-CN, uni6D2C-CN, uni6E54-CN, uni83C2-CN, uni83DF-CN, uni86C0-CN, uni8729-CN, uni8CD9-CN, uni90DC-CN, uni99B0-CN, uni9C4F-CN, uni9D69-CN, and uni9EF7-CN, respectively, in the KR CMap resource per Issue #61.
Map U+3B6D 㭭 and U+5225 別 to uni3B6D-JP and uni5225-JP, respectively, in the TW CMap resource.
Map U+5A66 婦 and U+7199 熙 to uni5A66uE0101-JP and uni7199-JP, respectively, in the KR CMap resource.
Map U+2F2C ⼬ to uniFA3C-JP in the JP (and, by extension, KR) CMap resource, and to uni5C6E-CN in the CN (and, by extension, TW) CMap resource.
Map U+284DC 𨓜 to uni9038-JP in the JP (and by extension, all) CMap resource.
Map U+8056 聖 and U+83BD 莽 to uni8056-TW and uni83BD-JP, respectively, in the KR CMap resource.
Investigate the U+F92C 郎 issue that affects the 'locl' GSUB feature.

hfhchan commented 7 years ago

Though they are out of the scope for the TW subset, the JP glyphs for U+77D2 and U+61DC could be used for the TW locale, similar to Issue #26 and #32.

kenlunde commented 7 years ago

@hfhchan: Because this is easy to do, it shall be done.

tamcy commented 7 years ago

u 8019_803b_tw

Incorrect mapping for U+8019 耙 and U+803B 耻 in TW. Both should look the same as the JP/KR glyph.

kenlunde commented 7 years ago

@tamcy: Noted. Thanks! (Also, while U+803B is outside the scope of the TW coverage, the fix is easy and shall be done.)

tamcy commented 7 years ago

Seems that U+9EFD 黽 is incorrectly serving CN glyph (CID 48026) in JP font with language set to "zh-tw". The language specific version SHSerif-TW uses the correct glyph which is CID 48025. Looks like an issue similar to #43?

kenlunde commented 7 years ago

@tamcy: Yes, it is similar to #43, and the solution is to map U+2FCC ⿌ to uni9EFD-JP (CID+48025) in the TW CMap resource.

hfhchan commented 7 years ago

The TW glyph for U+5225 should use that of the JP glyph. It is customary for the bottom left component to be joined upwards in the Code Charts, and the overall shape of 11202 looks too wide for TW/HK customary use.

The TW glyph for U+8382 and U+3B6D could borrow the JP glyph as well, since the bottom left protrusion is considerably closer to conventions.

kenlunde commented 7 years ago

While these mapping changes are relatively easy to implement, you missed the deadline for the dot release, given the extraordinarily large number of moving parts that are involved. These will need to wait for a subsequent release.

kenlunde commented 7 years ago

@hfhchan: I am willing to map U+3B6D 㭭 and U+5225 別 to uni3B6D-JP and uni5225-JP, respectively, in the TW CMap resource, but not U+8382 莂, mainly due to the inappropriate radical shape, which is far more striking.

hfhchan commented 7 years ago

@kenlunde, isn't the JP glyph and CN glyph for U+8382 equally inappropriate for TW? Changing the fallback to JP glyph instead of CN would not introduce more inappropriateness, as far as I can tell...

kenlunde commented 7 years ago

@hfhchan: Exactly, which is why I suggest leaving the mapping as-is unless a significant number of people also support the change.

tamcy commented 7 years ago

Incorrect lookup data for U+90CE 郎 in JP and KR OTCs. When tagged in TW or CN, the font should substitute CID 41693 with CID 41694, but CID 41717 is incorrectly served as indicated in the jp2cn and jp2tw tables.

shserif-u90ce

kenlunde commented 7 years ago

@tamcy: I torpedoed my earlier reply, because the situation is quite complex, and involves KS X 1001 and GB 18030.

U+F92C 郎 (its canonical equivalent is U+90CE 郎) is a GB 18030 character, and its representative glyph is the same as uni90DE-CN, meaning with the extra stroke, so its code point maps to that CN glyph. U+F92C 郎 was a KS X 1001 character. Its K source was moved to U+FA2E 郞 with U+90DE 郞 being its canonical equivalent.

Because this will be addressed after the Version 1.001 update, I will give this some more thought. The work-around, especially if you're using the OTCs, is to explicitly select the CN font in the OTC.

hfhchan commented 7 years ago

Consider removing jp glyph for U+85F2:

kenlunde commented 7 years ago

The JP glyph for U+85F2 藲, uni85F2-JP, is already a candidate for removal.

Jstamz commented 3 years ago

Glyph001 I found 12 KR glyphs of version 1.001 which are quite different to those in common Korean fonts. (함초롬바탕/HCR Batang: The default font of Hangul word processor) (HY 신명조, HY 견명조: Those fonts are common serif fonts, bundled in MS Office/Hangul Office) (나눔고딕: Popular free sans serif font) (한양해서: Widely used Regular script font in Korea)

I suggest to remap 8 of 12 glyphs above: 咎(u+548e), 嗤(u+55e4), 憊(u+618a), 潤(u+6f64), 竇(u+7ac7), 續(u+7e8c), 讀(u+8b80), 隙(u+9699). (Reason: Those glyphs are minor variants in Korea)

(1) I think the mapping for this character is referring to an incorrect glyph (I didn't see any other fonts render 潤(⿰⺡閏) as ⿰⺡⿵⾨壬. Comparing 潤[⿰⺡⿵⾨壬] to 閏[⿵⾨王] in the font, one could easily find the difference.) 潤(u+6f64): KR -> JP (*I think the KR glyphs for U+6F64 is both incorrect in Source Han Sans and Source Han Serif)

(2) KR glyphs for these characters are rarely used in Korea; JP glyphs is more often, widely used and similar to the traditional glyph. (These characters are phono-semantic compounds; the common phonetic component is 𧶠, not 賣.) 竇(u+7ac7), 續(u+7e8c), 讀(u+8b80): KR -> JP

(3) These KR glyphs are not rare, but are minor variants. 咎(u+548e): KR -> JP 嗤(u+55e4): JP -> CN 憊(u+618a): KR -> JP 隙(u+9699): KR -> JP

adobe-fonts / source-han-serif

Consolidation of Mapping Change Suggestions #37