Issues and Suggestions for Sans TC

NightFurySL2001 commented 8 months ago

Thank you for providing the TC community with such a great font! However, there are some significant issues and some personal suggestions that I hope Plex can resolve in the next release.

It seems that the character set of this font is based on Adobe-CNS1-7, so the CIDs are included for reference.

Glyph errors

CNS11643 error

The following are glyph errors from the CNS11643 standard that should be fixed immediately. The sample pictures are shown with IBM Plex Sans TC Bold (on top/left) and Source Han Sans Regular (on bottom/right). A copy of Unicode code charts is attached.

Char	Uni	CID	Error
嬔	U+5B14	11040	Should be 兔 with a dot, not 免. The current glyph is 嬎 U+5B0E.
汩	U+6C69	6328	The right part is made too similar to 汨 (U+6C68) with 日. This error is caused due to the Taiwan MOE forcing 曰 to be closed even though this is traditionally not the case. The 曰 in Plex Sans TC has a disconnected inside, and thus 汩 should too for clarity.
獡	U+7361	10607	Bottom right is wrong. The current glyph is 𤡯 U+2486F.
礡	U+7921	12492	The left should be 蒪 without 3 dots, not 薄. The one with 薄 is 礴 U+7934/CID+17139, which is included in Plex Sans TC presumebly due to HKSCS in Adobe-CNS1.
萒	U+8412	8854	The bottom part should be 亠公儿, not 亠㕣儿. The one with 㕣 is 𦳆 U+26CC6.

黃 component

黃 U+9EC3 and its related components are using the Taiwan MOE form with the center part using 田 with no protrusion, instead of 由 with a protrusion on top, which is not used in any Traditional Chinese communities before its introduction by Taiwan MOE. This form is more prone to be identified as incorrect especially for Traditional Chinese communities in Hong Kong SAR, Macau SAR, Singapore and Malaysia where the more popular form is 由. Some Taiwan font foundries also used 由 form.

For maximum cross-regional compatibility, 黃 U+9EC3 and its related components should be changed to use 由 with protrusion. Around half of the glyphs can be copied over from Plex Sans JP which has the correct form, and the rest just need slight modifications to pull the vertical stroke up. List of characters: 嚝壙嫹廣彉懭撗擴斢曠橫潢瀇熿爌獚獷璜矌磺礦穔穬簧纊蘣蘳蟥趪鄺鷬黃黈黌 Shown below TC on top and JP on bottom.

Suggest to change handwritten components to existing printing orthography

祺 U+797A/CID+3753 has a different 示 radical shape than the rest of 示 radicals. It would be wise to unify them. My personal suggestion is to copy the JP traditional form used by 祺 U+797A over to the rest of TC (as explained in the next section).

餤 U+9924/CID+11519 has a different 飠 radical shape than the rest of 飠 radicals. It would be wise to unify them. My personal suggestion is to copy the JP traditional form used by 餤 U+9924 over to the rest of TC (as explained in the next section).

随 U+968F/CID+15326 and 遷 U+9077/CID+4683 only have a single dot for ⻍. 遷 U+9077 should also have a 㔾 instead of 己 copied from JP. It would be wise to unify them directly with 2 dots as 2 dots are in line with traditional printing orthography.

Other miscellaneous not unified components

Someone probably modified 邊 U+908A/CID+5593 wrongly. on the right shows a similar character in TC, and bottom are JP. 邊 U+908A can be reverted back to JP directly (shown bottom).

Same goes for 邋 U+908B/CID+5594. The middle should be 4 slanted dots, not 4 horizontal lines. Also 邋 U+908B/CID+5594 and 鬣 U+9B23/CID+5968 should have a 㐅 inside, not 人.

Same goes for 益 U+76CA/CID+2370. The middle right should curve downwards, not upwards. This glyph can be reverted back to JP directly (shown on right). Additionally, 謚 U+8B1A/CID+11883, 鎰 U+93B0/CID+5446 and 鷁 U+9DC1/CID+13142 have disconnected middle strokes and should be unified to connected form similar to others.

薇 U+8587/CID+5218 has a different 微 compared to other components. Probably copied from JP.

籍 U+7C4D/CID+5676, 藉 U+85C9/CID+5407, 耥 U+8025/CID+18237, 耱 U+8031/CID+18239 and 耯 U+802F/CID+14228 have the same problem. The first stroke of 耒 on (bottom) left should curve up.

夅 U+5905/CID+17817. Bottom 㐄 should have a right-angle instead of slanted.

Suggestions

Orthography

The following are components that IBM Plex may consider to use for TC characters as they are more aesthetically pleasing, are still in prevalent modern uses, and fits the Grotesque design of the IBM Plex while maintaining the machinery features. All components below are with the hint of traditional orthography, and their design can be made to be more geometric and align with IBM Plex Sans.

These designs are also already in use in Sans JP, which most of them can be directly copied over into TC with minimal effort. I do recognise since the glyphs are already made, there will be some effort needed to redo them, but it will be very nice to have more printing forms in the font. I'm not sure about the contribution guidelines for this project, but if possible I can make the required glyphs for update.

示 radical may choose a two horizontal line + 3 (nearly) vertical line instead of a slanted stroke with white counter. This is already in use in TC 祺 U+797A and some of JP. To adjust: 榊礻礼礽社礿祀祁祂祄祅祆祇祈祉祊祋祌祏祐祑祒祓祔祕祖祗祙祚祛祜祝神祠祢祣祤祥祧祩祪祫祰祱祲祳祴祹祺祼祽祾祿禂禃禆禇禈禊禋禍禎福禐禑禒禓禔禕禖禗禘禙禚禛禝禟禠禡禢禤禥禧禨禩禪禫禬禭禮禰禱禲禳禴禶禷視视䄂䄃䄄䄉䄎䔃𡁜𡅏𥘵𥙑𥚃𥚕𥛣𥛶𥜝𥜥

飠 radical may choose the bottom with two horizontal line instead of one with white counter. Same as possible to replace with JP. To adjust: 籂蝕飠飢飣飥飦飩飪飫飭飯飲飴飵飶飹飼飽飾餀餂餃餅餇餉餌餎餑餒餓餔餕餖餗餘餙餚餛餜餞餟餡餤餧館餩餪餫餬餭餯餰餱餲餳餵餷餸餹餺餻餼餽餾餿饀饁饂饃饅饇饈饉饊饋饌饍饎饐饑饒饓饖饘饙饛饝饞饟饡饢䬬䬷𩜠𩜲𩟔

羽 radical may choose to have parallel strokes for the middle part. Same as possible to replace with JP. To adjust: 傝僇剹勠嗡嘐噏噿嚁塌塕嫪嬥寥嵺嶍廖慴憀戮戳扇搧搨摎摺擢暡曜栩榻槢樛櫂歙毣毾溻滃漻潝濢濯瀚瀷煽熠熤燿珝璆璻瘳瞈矅磟祤禢禤穋籊糴糶繆羽羾羿翀翁翂翃翅翇翉翊翋翌翍翎翏翐翑習翔翕翗翛翜翝翞翟翠翡翢翣翥翦翧翨翩翪翫翬翭翮翯翰翱翲翳翴翵翷翸翹翺翻翼翽翾翿耀聬膠蓊蓼藋蘙螉蟉蠗蠮褟褶詡謆謬謵譾豂趐趯蹋躍轇遢鄝醪鎉鏐鑃闒闟雡霫頨顟飁飂騸騽鰨鰼鶲鷚鸐龮㮼䁯䎗䎚䐥䕜𡟸𢄪𢣷𣝦𥔱𥣞𦆲𦏵𦏸𦐂𦐐𦐑𦐒𦑊𦑩𦒄𦒈𦒉𦒍𦒘𧄍𧝁𧷜𨃩𨌺𨦫𨯪

For the additional HKSCS set in Adobe-CNS1, two particular characters are not unified with other TC characters: 娧 U+5A27/CID+15841 and 祱 U+7971/CID+16744. Considering that Hong Kong SAR users are more tending towards using 八-shaped 兌 than 丷-shaped 兑, maybe these 2 characters may be changed to use 八-shaped 兌 instead of following the HKSCS 丷-shape.

掰 U+63B0/CID+8526 and 搿 U+643F/CID+9902 could probably have the right hand hand 手 to not slant upwards, since it's not avoiding any components. No pressure for this though.

Character set expansion

It is definitely an impressive feat to complete the Adobe-CNS1. However, for daily Traditional Chinese usage, there are still lacking characters, particularly in Cantonese, (Taiwanese) Hokkien and Hakka. They have their own Chinese characters and additional transliteration script extensions.

For Cantonese, the 《常用香港外字表》(Common Supplementary Characters in Hong Kong) level 1 to 6 is a great Chinese character extension list on HKSCS that includes daily characters that are not in HKSCS.

For Hokkien and Hakka, the resource on 《本土語言外字表》(Native Language Extra Word List) lists additional Chinese characters that are used in the two scripts. Additionally, they also require support for two transliteration Latin scripts: 白話字 (Pe̍h-ōe-jī) and 臺灣閩南語羅馬字拼音方案 (Taiwanese Hokkien Romanization Solution/Tâi-uân Bân-lâm-gí Lô-má-jī Phing-im Hong-àn) for representing sounds in Hokkien and Hakka similar to hanyu pinyin for Mandarin Chinese.

These 3 scripts may also require additional bopomofo in the Bopomofo and Bopomofo Extended Unicode block for transcriptions, along with additional tones in Spacing Modifier Letters, particularly U+02EA ◌˪ and U+02EB ◌˫. These have been added into CNS11643 Amendment 1 in 2023 for correct support.

Character set removal

The glyphs from CID+14009 to CID+14048 are used in Big5 era for the IME 行列輸入法 (array input method), specifically the 40-key version. However, there are no use for these characters anymore in modern usage as array input method has since declined, and the remaining are using the more popular 30-key version that does not require additional special characters to type the character correctly. Thus, I would suggest IBM Plex Sans TC to remove these glyphs similar to Sans JP in skipping these CIDs altogether.

Scratchbin commented 8 months ago

The line spacing of TC is too high while JP has less.

240125 172445_048 soffice.bin WbsUYcla.png

Buernia commented 8 months ago

Design inconsistency of 丙 component, both in TC and JP:

未标题-1 Text: 寎怲抦昞昺柄炳病窉苪蛃邴鈵陃陋

Design inconsistency of 豕 component in 蕤, only in TC:

未标题-2

NightFurySL2001 commented 8 months ago

@Buernia The 丙 component in JP is intended and expected, Adobe-Japan1 has such differentiations by default. Only 陋 is incorrect for JP presumeable copied from Source Han Sans which has the same problem.

For TC however, either form is acceptable. The JP dot form can be copied over to TC, or just edit the individual 丙 character.

For 豕 component, the JP form is acceptable by TC communities and may be directly copied over for minimal difference between JP and TC (if intended) and more traditional printing form. This is however quite a big edit compared to the 示, and 羽 components listed in the first comment,. 傢像儫冢啄喙噱嚎塚塜墜墬壉壕嫁家幏幪彖慁懅懞掾據攭曚朦椓椽榢橡檬檺櫞櫫欚氋涿潒澽濛濠烼煫猭獴琢瑑璩瘃矇硺礞禒稼篆籇緣腞臄艨蒙蕤蝝蟓蠓蠔蠡褖襐諑譹豕豖豚象豢豪豫豰躆醵鎵鐆鐌鐻隊霥靀餯饛鱌鶨鸏㙇㠙㧻䐁𠎵𠺢𡁏𡟇𡟼𡣘𡱰𢵌𣋡𣫛𣺊𣽁𤐶𤨎𧱬𨧧𨪂𨮙

Buernia commented 8 months ago

Considering most of 豕 components use the new form, It would be more appropriate to modify 蕤 only.

UltimateAmitieKaiNiC commented 8 months ago

囪部件未修改

Marcus98T commented 8 months ago

IBM just took the TC fonts down. They had determined that the total file size for all the Plex fonts is too big for npm, so they are trying to work around the problem.

However, you may still be able to download the fonts here, as I was able to find them when checking the commit history.

They will put them back up shortly once they figure out a solution, and maybe the Simplified Chinese version can follow afterwards.

EDIT: Just saw the exact reason why they took the fonts down and corrected my post.

cathree3 commented 8 months ago

⻖陳⻏部阝

The ⻖are not horizontally aligned

NightFurySL2001 commented 8 months ago

聖 has different bottom than 檉蟶. Should follow 檉蟶 with curved top.

塣 top right component 呈 has different bottom than 埕悜桯浧珵程脭裎逞郢酲. Should follow the rest with curved top.

筳 has different 廷 length as the rest. Should modify the rest to use 3rd horizontal stroke longest (following orthography) or keep consistency with middle longest.

cathree3 commented 8 months ago

the 舌 is sometimes 千, and sometimes 干. maybe they can unify, like 舌 of “LiHei Pro”.

佸刮咶姡懖括蛞活濶筈聒萿葀話趏适颳髺鴰舌栝恬憩湉甜餂舐舑舔舕舚銛

Marcus98T commented 8 months ago

免 (U+514D) and 勉 (U+52C9) are using unmodified Japanese Shinjitai forms. Please modify them for component consistency with the other 免 components. I can bet that the unreleased Simplified Chinese version may have the corrected forms for these two characters, so Sandoll can take those SC glyphs and put them into the TC font.

Left - IBM Plex Sans TC, Right - Source Han Sans SC (Simplified Chinese form)

cathree3 commented 8 months ago

if Plex Sans TC using “the dot to left” for 刃, it will make the spacing more even, like Plex Sans JP. 仞訒澀梁粱 issue-06 issue-05

Marcus98T commented 8 months ago

As per @BoldMonday's request, I am reposting a minor interpolation bug of 進 (U+9032) in the TC version here, for organization purposes. It is unfortunately inherited from the JP version. I suppose during the development of the TC font, it was simply overlooked.

cathree3 commented 8 months ago

the 壬 is not unified on 呈, 聖, 望……

Marcus98T commented 8 months ago

@cathree3 This was already mentioned earlier. Please read the previous posts carefully before posting a duplicate issue, unless you also want to talk about the uneven spacing between 𱼀 and 壬 in 望 (U+671B), which I find isn't a big deal.

聖 has different bottom than 檉蟶. Should follow 檉蟶 with curved top.

Originally posted by @/NightFurySL2001 in https://github.com/IBM/plex/issues/556#issuecomment-1917073153

You can also check to see if the character you want to report is already reported in the GitHub issue search bar, or use Ctrl+F.

ItMarki commented 7 months ago

Taking this chance to point out an inconsistency.

In the table below, the glyphs are classified by whether the ㇏ stroke in the left component remains unchanged (Form 1) or is turned into a ㇔ stroke (Form 2). form1and2

It is clear that even for the same component 叕, some glyphs fall under Form 1, and some under Form 2.

I personally prefer unifying all such glyphs to match Form 2.

ItMarki commented 7 months ago

禥 and 禩 should not have the 衤 component.

NightFurySL2001 commented 6 months ago

@BoldMonday @mjabbink may I ask what's the current status of Plex Sans TC? Will it be re-released in GitHub repo again soon?

SCLu17 commented 5 months ago

Would like to throw in my own 2¢ here regarding the 舌 component as mentioned here: I think there is value in separating characters which originate from 𠯑 (⿱ 氏口) and characters which originate from 舌. The characters mentioned are actually split according to their origins: 佸刮咶姡懖括蛞活濶筈聒萿葀話趏适颳髺鴰 all follow 𠯑 (⿱ 氏口) and should use 千 at the top, whereas 舌栝恬憩湉甜餂舐舑舔舕舚銛 all derive from 舌 and should use 干 at the top.

𠯑 (⿱ 氏口) gradually borrowed the 舌 form but retained the 丿 stroke at the top, whereas 舌 was originally horizontal at the top, but gradually used a slant stroke in handwriting. Modern HK and TW handwriting guidelines distinguish between the two, not from an arbitrary standpoint, but from a real lexicographical difference going back to the Shuowen Jiezi (small seal script), even if that distinction had become obscured over time.

As the original comment said, Plex currently does indeed implement this inconsistently, but I disagree that it should be unified to LiHei’s forms, which takes more inspiration from vulgar forms and is contrary to the guiding principles Plex TC seems to be following.

SCLu17 commented 5 months ago

After a brief review of the TC font, there are a few additional issues I’ve found as of v1.000. I’ve categorized by them by order of urgency.

Incorrect glyph shapes (beyond known variances in CN/JP/KR)

U+4965 (䥥), right half has to follow the vulgar form 亷, not the correct form 廉, which is encoded to U+942E (鐮).
U+50BC (傼), upper right follows 廿, not 卄.
U+FA12 (晴), the compatible form has to follow 靑 (U+9751), not 青 (U+9752).
U+FA26 (都), the compatible form has to have a dot above the 日 at the lower left.

Major inconsistencies (immediately noticeable with forms unfamiliar to TC users)

U+5BE7 (寧), the glyph currently follows JP form. The middle should be 皿, not 罒. Other glyphs which use 寧 (擰獰檸) are correct.
U+6E23 (渣), unify with other glyphs containing 查 (揸楂). Greater China uses 查, JP uses 査.
U+8CD3 (賓), currently uses the JP Shinjitai form (with an extra stroke). The middle should follow the lower half of 步, not 少. All other characters (濱檳繽) do not have this issue.

Inconsistencies (compared to other glyphs sharing the same components, based on which component variant is used the most)

U+451D, U+6F6B, U+6FDD, U+7020 and U+261DD (䔝潫濝瀠𦇝), the bottom of 糸 at the bottom/lower right should not hook left. Other glyphs (縈綦紫) are not affected.
U+46BB and U+46D0 (䚻䛐), the top stroke of 言 should be horizontal so as to be consistent with other glyphs which use the 言 component.
U+4EA3 and U+4EB7 (亣亷), the top should follow other glyphs with the 亠 radical and be vertical at the top. 亣 is a variant of 大.
U+500F and U+5135 (倏儵), top right should follow other glyphs that follow 攸, such as 條脩鯈.
U+5029 and U+68C8 (倩棈), right half should be unified with other 青 (U+9752) glyphs (情靖請). Note U+9751 (靑) is encoded separately and is therefore a special exception.
U+5510 (唐), the middle (肀) currently follows the Japanese form (but is itself a common handwriting variant). Its bottom should stick out. Other glyphs (塘糖) are correct.
U+5AF2 and U+8534 (嫲蔴), the non-radical portion should follow other glyphs that follow 麻 (嘛麼). Its inside should be 𣏟, not 林.
U+5B82 (宂), bottom half should follow 儿, not 几. Only the CN form follows 几, here a vulgar form. Lexicographically, it traces back to 宀 (house) and 儿 (person; 儿 is historically a variant of 人).
All characters containing 害 (害割嗐搳犗瞎磍縖螛豁轄鶷) should have its middle (丯) unified with 憲. The vertical stroke's bottom traditionally sticks out, but TW and JP orthographies replace it with 龶. The top part of 憲 comes from a reduction of 害; the two share the same root, and therefore that middle part 丯 should look the same between the two.
U+6145 and U+9F1C (慅鼜), the left-most dot of the 蚤 component should point in the same direction as the dot in the middle, as with other characters which follow 蚤 (搔騷蚤).
U+61B2 (憲), this particular glyph currently uses the JP form and should be made consistent with other glyphs that use 憲 (幰櫶瀗). The vertical stroke of 丯 should stick out at the bottom, although TW and JP orthography modifies this to become 龶. See also 害 above.
U+7029 and U+75DC (瀩痜), the lower half of 禿 should follow 儿 (HK/TW), not 几 (CN).

Miscellaneous changes

U+5844, U+6123, U+695E (塄愣楞), changing the top right to 四 is recommended. While state guidelines prescribe 罒 at the top right, this is actually vulgarized from 四. 楞 originates from 木 (wood), 四 (four) and 方 (corner), meaning a square pillar. 愣 and 塄 both derive from 楞.

Suggested additions

U+596C 奬, a common nonstandard variant of 獎 (U+734E), following 大 instead of 犬. Note while U+596C is nonstandard in HK and TW, it is the standard ‘old’ form in CN and JP. Due to their visual similarity, some IMEs like handwriting can input one instead of the other, so having both is recommended. This glyph can be copied directly from Sans JP.
U+654E 敎, the old form of 教 (U+6559) (note difference in top left). While no longer the preferred form in HK and TW governmental guidelines, U+654E remains common and is the preferred form in certain institutions, e.g. the Catholic Church in Hong Kong. Some HK legislation also uses U+654E. Not including it will result in rendering problems in common applications. This can be copied directly from Sans JP.
U+7232 爲, a common variant of 為 (U+70BA). Can be copied directly from Sans JP.
U+7C52 籒, ancient variant of 籀 (U+7C40), commonly used in Han lexicography and orthography studies, refers to an early seal script variant.
U+9682 隂, common variant of 陰 (U+9670) before the twentieth century.
U+96B7 隷, common variant of 隸 (U+96B8), can be copied directly from Sans JP.
U+96EB 雫, very common in Japanese names, without a Chinese equivalent. In order to preserve proper noun representations, it is strongly advised that this be included in the TC font too.

mjabbink commented 5 months ago

Thank you all for the feedback. The Sandoll design team is addressing much of what is noted above. We have a new workstream that will address the consistency details, among others, and add new glyphs to meet new standards/requirements.

Des-Magmeta commented 3 months ago

𧥺(U+2797A) the right component 勻 has different last stroke than 伨呁均昀枃汮畇盷蚐袀鈞韵㚬. Should follow the rest with the curved last stroke.

NightFurySL2001 commented 2 months ago

𠸖 U+20E16 and 䴇 U+4D07 have different 令 component than the rest. However, in this case I propose to use the JP form (or 𠸖 U+20E16). This form with a straight bottom ㄗ fits the geometric design of IBM Plex Sans more, and does not introduce inconsistent curvature in 龴 (the bend looks very weird in this design which did not match the curvature of the top 人). The ㄗ form also matches the traditional printing form used in books, magazines and posters.

mjabbink commented 2 months ago

FYI @Sandoll-DS

NightFurySL2001 commented 1 month ago

Similar to 黃 above, 善 should be reverted back to JP form as the JP form is what the majority of Chinese usages have been using before Taiwan Education forcefully changed the form. Many commercial TC fonts use JP form too. (JP top, TC bottom)

Also noting 辥, where the top left should be 屮 with protrusion at the bottom (like 辥 itself). The other 4 characters (孼櫱糱蠥) should be modified. The current form is wrongly modified wrongly by Taiwan Education and is not a suitable choice for other Chinese regions. (Source Han Sans JP for reference top, Plex TC bottom)

捩 should be unified and adding the dot in 犬.

Suggest to add the hook in 殺閷 for consistency, but may be ignored if space is too limited (seem to be plenty even in Bold though). 脎 should be unified too referencing HKSCS.

NightFurySL2001 commented 1 month ago

骏, although a Simplified Character, should probably match the other TC forms using separated 厶 strokes and 儿 instead of 八 since of its inclusion in HKSCS; similar is done to 设 and 长.

rschiang commented 1 month ago

As much as I love the traditional/conventional forms suggested here, wouldn’t you think it’s too much for a font foundry to follow? If there isn’t an established “traditional form” standard, I doubt IBM or Adobe would ever implement these requests as they cannot justify these design choices.

I’d like to suggest to narrow this thread to apparent errors and standard non-conforming mistakes.

NightFurySL2001 commented 1 month ago

@rschiang the suggestions raised are mainly because IBM Plex TC is expected to be used cross-regionally, and some of the current forms would be perceived by users as incorrect outside of Taiwan (mainly for 黃，善 and 辥). They are used in Taiwan too outside of educational system.

The suggestion is also made only when it is easy to do so: most of the suggestions are to revert TC glyphs back to JP. The design choices have already been made in JP, and keeping TC partly consistent with JP is more maintainable than requiring separate glyphs for TC. Also noted that some suggestions are the representative glyphs used in Adobe-CNS1 that the font follows.

IBM / plex