Closed kenlunde closed 7 years ago
Issues #92 and #97 are consolidated here.
Glad to know that SHS 1.002 has been released, thanks for the hard work.
I'd propose to unify the 辶 component so that certain glyphs of CN and JP can be shared.
KR's 辶 will always have two dots at the top left corner, while CN will always have one. JP's case is a mix of the two, the 辶 may have one or two dots at the top left corner.
Given there is no difference in other components, the two-dot version glyph can be shared among JP and KR. Theoretically this is also true for CN and JP. However, currently the glyphs can't be shared because CN uses a slightly different design for the same component, which doesn't seem necessary.
2015/4/23 : updated image
Thank you for the suggestion. Also, I just edited the sentence in the first post to this Issue.
I second @tamcy 's suggestion. If we compare SimHei and Microsoft YaHei, which are both fonts built towards the PRC standards, it is obvious that using the design on the left is acceptable (and likely more modern-like and preferrable).
FYI I have talked with Ms Lu Qin on the stroke connection for components such as 口, 辶 and 又, and she indicated that the exact stroke connection point is deemed as a feature of asethetics, which is likely to be codified in the document for the upcoming registration of the Hong Kong IVD Collection. The main concern of the glyphs for Hong Kong is that only the type of stroke has to be correct. In this case, for example 請 may directly map to the cn glyph as the connection between the first two strokes is, connection point for 口 are deemed as asethetic issue and not a requirement. However, the character cannot directly map to the tw glyph, as the first stroke of the right bottom "月" must be stroke 2 instead of stroke 3. This should dramatically reduce the number of glyphs that should be redesigned.
Also 丸, 飄, 凝, 摜 CN -> JP 多 TW -> JP 忍 CN -> KR 牆, 壯, 伴, 光, 心, 乾, 晚, 色, 怎, 慣, 馮, 載, 踢 CN/TW -> JP 抱, 勤 CN/TW -> KR
I third tamcy's suggestion about 辶. As for Korean, as long as they're consistent across characters (that is, every character with that radical takes the same shape in KR font instances) except for pairs of characters for which the only difference is 'one dot vs two dots' in that radical , either form should be fine.
For one dot + horizontal line, e.g. 應,言, etc. I think the dot just touch horizontal line is a good design to unify CN and TW (and possibly HK) glyphs. In Song design, the TW MOE Song design must always touch. The PRC Song design does not usually touch unless not enough space. I think we can get away with using dot "touch" line instead of dot "separated from" line or dot "join" line.
For 透, I think it may not be that simple - the 乃 component is a bit different in there.
@RyanChng You are correct, I'll revise the screenshot when I have time (EDIT: updated).
While we very much appreciate any and all feedback about the extent to which glyphs can be shared across the supported languages, the decision will ultimately be made by the typeface designer, Ryoko Nishizuka, in consultation with Changzhou SinoType's typeface designer, and to some extent, with me.
I am planning to begin this process by preparing a list of candidate glyphs, in an effort to reduce the size of the candidate list. I will separately list particular components for consideration. These will be used as Ryoko's guide.
Also, please try to refrain from using this "issue" as a discussion thread. If a discussion is necessary, I suggest it be done elsewhere, and the results posted to this issue.
AdobeJapan1-6 contains many variants, but some of them are designed for serif fonts, e.g. 父, CID+3541 vs CID+13497. I found Source Han Sans also includes these glyphs, and they are having the same design. Is it possible to remove them because they look like some sort of duplication?
@jimmymasaru: We have a requirement to support the Adobe-Japan1 IVD collection, which is to have unique CIDs. We understand that some of the glyph distinctions cannot be represented in a sans serif (aka gothic) typeface design, meaning that we are fully aware of this.
BTW, your comment would also apply to Kozuka Gothic.
U+5840 塀 U+584F 塏 I suggest those two glyphs, as well as others containing 䒑, can be unified because they are all design differences. I checked Meiryo and Microsoft Yahei and found it's flexible to have the second stroke fully linked or not fully linked with the third stroke.
U+5852 塒 The JP glyph of this character can be unified with the one of CN, because the 土 and 寸 are linked in other characters like 寺時侍持.
㪽㪿㫀 Can the glyphs containing 斤 be unified between JP and CN?
All the characters listed above are having the same glyphs in Hiragino Kaku Gothic and Hiragino Sans GB.
Thank you for the continued suggestions. I am starting to ramp up preparations for Version 2.000, which will take a few months to complete due to the various things that I'd like to accomplish, one of which is better support for Hong Kong. This particular glyph-sharing issue is important because the intent is to free up enough CIDs to make the Hong Kong support possible.
I would like to add that the case of 斤 can also be applied to characters of same or similar strokes, like below:
(Top: JP, Bottom: TW)
后臼興 (also implies 盾垢揑舁) can be shared if unified. 學段劉 are examples of non-shareable but affected glyphs (but 壆覺 can be shared).
To demonstrate progress on this issue, and as the first real step toward Version 2.000, I spent a solid three days last week compiling glyph pairs that can potentially be shared across languages, and came up with 832 candidates. (I use the term "candidate" because the final determination is made by the designers, but I am the best person to come up with a list of candidates.) I plan to spend part of this weekend and Monday to find additional glyphs pairs, though I suspect that the figure will be one or two digits at most.
The designers will use the materials that I am preparing, which show the pairs side-by-side at the weight extremes (ExtraLight and Heavy) and at an intermediate weight (Medium), and also overlaid at the Regular weight, to determine which glyph pairs can be reduced to a single glyph.
Remember that the primary purpose of this particular exercise is to free up enough CIDs to provide proper, or at least adequate support, for Hong Kong SCS (HKSCS). (Though Hong Kong is moving toward a new standard abbreviated HKCS, and thus represents somewhat of a moving target.)
Just FYI, the differences shown in @tamcy's 2015-04-21 and 2015-06-14 posts above will not be shared, because these language-based differences were intentionally established at the designers' discretion.
@jimmymasaru: About your 2015-06-14 post, the CN and JP glyphs for 塀, 塏, 塒, 寺, and 持 are already among the candidates for sharing, which were captured by the "fine-tooth comb" work that I did last week. I am simply confirming that they were detected as candidates through my systematic efforts. Note that 時 and 侍 are not candidates, because JP and CN already share the glyphs (the former is a CN glyph that JP uses, and the latter is a JP glyph that CN uses).
Not sure the following glyphs are already on the list, but I'll post it anyway.
U+51BD 冽: 11410 (J/K) = 11412 (T/C)
U+52A0 加: 11783 (J/K) = 11784 (T/C)
U+53FB 叻: 12356 (J/K) = 12357 (T/C)
U+5420 吠: 12409 (J/K) = 12410 (T/C)
U+5FCC 忌: 17914 (J/K) = 17915 (T/C)
U+6028 怨: 18062 (J/K) = 18064 (T)
U+606A 恪: 18169 (J/K) = 18170 (T/C)
U+60B2 悲: 18288 (J/K) = 18290 (T)
U+617C 慼: 18656 (J/K) = 18657 (C)
U+64BC 撼: 20061 (J/K) = 20062 (C)
U+67F1 柱: 21464 (J/K) = 21465 (T/C)
U+6BB2 殲: 23207 (J/K) = 23209 (T)
U+6CD7 泗: 23684 (J/K) = 23685 (T/C)
U+7199 熙: 25841 (J) = 25843 (C) (Added 2 May)
U+74E6 瓦: 27313 (J/K) = 27315 (T)
U+7765 睥: 28445 (J/K) = 28446 (T/C)
U+7BD9 篙: 30527 (J/K) = 30528 (C)
U+8304 茄: 34230 (J/K) = 34231 (C)
U+9D60 鵠: 61949 (K) = 46386 (T/C)
@tamcy: Thank you. I will check this list against my current notes, but at least U+7BD9 篙 cannot be unified between J/K and C due to its seventh stroke.
@tamcy: I finally had time to compare your list to my list of sharing candidates. All of them were included in my own data, except for U+7BD9 篙 that cannot be shared for reasons explained above. It was reassuring to confirm that your list was a pure subset of what I independently came up with.
U+56CD 囍: 13554 (J/K) = 13555 (T/C) U+50D6 僖: 11011 (J/K) = 11012 (T/C)
Actually there is a subtle difference between the two set, which is how "口" touches the last two strokes of the "壴" component. But this should be a designer's preference issue.
@tamcy: Both of these characters are included in my list of sharing candidates.
@kenlunde : How to convert Simplified Chinese to .eot?
@Rameshdaspam: Converting to EOT, using the command-line ttf2eot tool, first requires a TTF version of the font, which is a DIY affair. We have no plans to deploy these fonts as TTFs.
Actually we need Webfont for Simplified Chinese. SourceHanSansSC SourceHanSansHWSC
@Rameshdaspam: If the fonts are not in a format that you can use, you will need to convert them into the desired format. I do not know enough about your request to advise further, other than supplying EOTs is a non-starter due to the TTF requirement.
You don't need to assign two different glyphs for U+115F and U+1160, as they are just fillers. In fact, CID+461 is currently shared by U+1160 and U+3164. Instead of this, you can use CID+460 for U+115F, U+1160, and U+3164; and use CID+461 for something else.
Also, you don't actually need these: 63752 Hangul OldHangul-LeadingConsonants uni115F.ljmo01 63877 Hangul OldHangul-LeadingConsonants uni115F.ljmo02 64002 Hangul OldHangul-LeadingConsonants uni115F.ljmo03 64127 Hangul OldHangul-LeadingConsonants uni115F.ljmo04 64252 Hangul OldHangul-LeadingConsonants uni115F.ljmo05 64377 Hangul OldHangul-LeadingConsonants uni115F.ljmo06 64502 Hangul OldHangul-Vowels uni1160.vjmo02
My understanding is that the glyphs with .ljmo0[1-6] at the end have the width of 920, and ones with .vjmo0[1-2] or .tjmo0[1-4] at the end have the width of zero.
For the first six of the above, you can just simply use CID+740 (uni115F; nominal form of U+115F), as CID+740 is already a spacing glyph. You don't need seven (including the nominal form) 920-width blank glyphs; one is good enough. For the last one, you can just simply use CID+64407 (uni1160.vjmo01). You don't need two zero-width glyphs with nothing in them; one is good enough.
You can save seven glyphs (eight if you count the comment right above this one).
While I have a preference to keep these glyphs, because their presence makes debugging the three GSUB features an easier process, I am willing to build a test font that includes only the nominal and combining forms, along with the space (U+0020), but excludes the eight glyphs that you indicated above (and substitutes them with the appropriate glyphs in both the 'cmap' table and GSUB features).
@acuteaccent: If I were to build such a test font, would you be willing to test it?
@kenlunde Yes.
Excellent. This will be my weekend project.
@acuteaccent: Apologies for the delay. I spent the evening building test fonts. The one named CombiningJamoTestAll-ExtraLight.otf includes only the glyphs necessary for combining jamo (the nominal and combining forms of 1100-11FF, A960-A97C, D7B0-D7C6, and D7CB-D7FB) plus U+0020 (space). The one named CombiningJamoTest-ExtraLight.otf is the same, but excludes the eight glyphs mentioned above (uni1160, uni115F.ljmo01, uni115F.ljmo02, uni115F.ljmo03, uni115F.ljmo04, uni115F.ljmo05, uni115F.ljmo06, and uni1160.vjmo02), and modifies the 'cmap' table and GSUB features accordingly. Please test at your earliest convenience.
In terms of a test file, please use this one, which includes all 30,222 two-and three-character sequences—among the possible 1,638,750 ones—that include U+115F or U+1160.
While I think that we can get away with removing uni115F.ljmo0[1-6] and uni1160.vjmo02, which will save seven glyphs, I think that we need to keep uni1160. Let me explain. When rendering, my initial testing suggests that it is okay to use the same glyph for U+115F and U+1160, but when a PDF is created, any instance of U+1160 in the original text will be converted to U+115F when the text is copied from the PDF. I will build a third test font later this morning that keeps uni1160.
The test fonts link above now corresponds to a ZIP file that contains a third test font, CombiningJamoTest1160-ExtraLight.otf, which is identical to CombiningJamoTest-ExtraLight.otf except that it retains the nominal glyph for U+1160 (uni1160), both in terms of the 'cmap' table and GSUB features. Only seven glyphs—uni115F.ljmo01, uni115F.ljmo02, uni115F.ljmo03, uni115F.ljmo04, uni115F.ljmo05, uni115F.ljmo06, and uni1160.vjmo02—have been removed.
CN glyphs for U+611F, U+61BE, U+64BC can be shared with the JP glyphs, since that is done for U+8F57 anyway.
Personally, the JP one feels "more Chinese" than CN with its asymmetric balance.
Incidentally, U+6FB8, U+9C64, U+9CE1, U+3673, U+40ED, U+425E, U+4717, and U+4AF2 need to be redesigned if the JP version is preferred.
U+501F CN/TW === JP/KR
U+503B TW === JP/KR
U+4E51 U+4E5A
The difference between these two characters in SHSans are unnecessary compared with those in SHSerif.
Per the following two posts on 心, https://github.com/adobe-fonts/source-han-sans/issues/98#issuecomment-292627176 https://github.com/adobe-fonts/source-han-sans/issues/98#issuecomment-216016226
it seems that the difference between 心 in CN/TW and JP/KR are a widespread phenomenon instead of simply confined to compounds containing 感.
My personal opinion is that the difference in the placement of the 點 above 豎彎鉤 is much more minor than the stroke joining of 又 and 叉. I think the former is an aesthetic design issue only, while the latter is a stroke-level mandatory requirement by the MOE. I would personally prefer the glyphs to be allocated to solve the latter instead of the former.
The CN/TW glyph of U+4E1E should use the JP/KR glyph.
As far as the code charts are concerned, the starting position of the 捺 (5th stroke) should be the 豎鉤 (2nd stroke) for TW, which is markedly more similar to the JP/KR glyph. The CN glyph looks similar to the one in the code chart though.
For PMingLiU and DFKai-SB, the starting position of the 捺 (5th stroke) is also the 豎鉤 (2nd stroke). For Microsoft Jhenghei, the starting point is exactly at the intersection of 橫折 (1st stroke) and 豎鉤 (2nd stroke).
The situation for CN fonts is similar. For Kaiti and SimSun, the starting position of the 捺 (5th stroke) is also the 豎鉤 (2nd stroke). For Microsoft Yahei and SimHei, the starting point is exactly at the intersection of 橫折 (1st stroke) and 豎鉤 (2nd stroke).
Therefore, both the CN/TW glyphs can be safely remapped from the JP/KR glyph.
(I write this comment to show that I did not ignore Ken's request. I already responded him via email, but forgot to write a comment here.)
https://github.com/adobe-fonts/source-han-sans/issues/98#issuecomment-263879666 About the test font without the seven blank glyphs: There was no problem on my end. Everything was okay.
@acuteaccent: We also confirmed this via Source Han Serif.
Consolidated with Issue #179.
This Issue will be used to consolidate suggestions for glyphs that can be shared across more than one language, with the intent to purge one or more glyphs per code point, which will free up CIDs for accommodating additional glyphs. Please report any future glyph sharing suggestions to this Issue.