Closed GoogleCodeExporter closed 9 years ago
Thanks for reporting the inconsistency. This actually works as intended due to
technical limitations.
The OpenType format allows at most 65,535 glyphs in a font file. With that, we
can only accommodate Taiwan Ministry of Education standards for characters in
the Big5 character set in this CJK font. For characters outside of the Big5
character set, a glyph common to Traditional Chinese, Simplified Chinese,
Japanese and Korean (if applicable) are used to save space. Such characters may
or may not conform to Taiwan MoE's standard. It is the latter in the case of
"嘅" (U+5605).
Original comment by ping...@google.com
on 16 Jul 2014 at 6:53
Thank you for your quick response. I'm not sure if I understand you correctly.
1. I am now referring to the typeface dedicated to Traditional Chinese, where
Taiwan Ministry of Education standard is followed (as you have said). I am not
asking for providing a alternate writing style for the same codepoint.
2. For the case "嘅", this word as a variant to "概" is now obsolete.
Nowadays, 嘅 should only be used in Cantonese-speaking regions including Hong
Kong. Its writing standard is similar to that of Taiwan. So, I think it makes
sense to render 嘅 consistently in the TC version of the font.
3. I understand that 嘅 is not in the Big5 charset. That said, I am unaware of
any region which uses 旣 as the standard form which results in the phonetic
component of 溉概慨嘅 to be rendered as 旣(白+匕) instead of 既 for the
sake of consistency. I am no expect on this topic, so I did a little bit of
search and used Noto font specified for other regions to support my assumption:
a) AFAIK both Traditional and Simplified Chinese use "既" as the standard form.
b) For Japanese, I searched on the net and found that "既" should also be
their standard form. From the Noto Japanese font, 忄既 (but not 忄旣) is
used as the writing form of "慨" (がい) . The writing form of "既""旣" and
"概""槪" are not normalized so they have different codepoints, but there is
only one codepoint for "慨". If "忄旣" is the standard form it should be
used in the Japanese version of the font for this codepoint.
c) Korean is where I have least confidence. But from the Noto Korean font,
"慨" is present but it is also not in the form of "忄旣". Instead it is
rendered as "忄既". And there is no "口既" in the font.
So I still don't understand why "口旣" is chosen for the Traditional Chinese
version of 嘅 even for the sake of a common glyph across different regions due
to a technical limitation. On the contrary, this decision makes this word
"stand out".
4. Perhaps the most interesting part is that, as you can see in the screenshot,
"嘅" in Simplified Chinese version of Noto is exactly what I am asking for. So
it seems that the glyph is already there, but the incorrect(?) one is chosen
for the Traditional Chinese version. In this case my issue can be rephrased as
: I believe that the glyph of "嘅" in Traditional Chinese version font should
follow that of the Simplified Chinese version for consistency.
Original comment by e.ta...@gmail.com
on 16 Jul 2014 at 9:11
Attachments:
Original comment by roozbeh@google.com
on 16 Jul 2014 at 6:06
I went back to the standards and I found that the current glyph actually agrees
with the standards.
U+5605 is listed in CNS-11643 at code point 3-4636, and here is its page:
http://www.cns11643.gov.tw/MAIDB/query_general_view.do?page=3&code=4636.
It is written as 口+旣 in the Mingti/Kaiti/Songti samples, and its components
include 白 and ⼔.
I'm attaching a screenshot of the unicode chart for U+5605. Taiwan, Japan and
Korea write it in the same way, namely 口+旣, while China and Vietnam write
it as 口+既.
My conclusion is that the current U+5605 glyph shown in Traditional Chinese
conforms to standards.
Original comment by ping...@google.com
on 16 Jul 2014 at 7:28
Attachments:
Thanks for your clarification. This news is quite astonishing to me because
this decision sounds irrational:
1) 既 and 旣 are just variant of each other, and are not different words with
different meanings.
2) Considering their origin (but not the standard), 溉概槩慨廐暨厩穊
and 嘅 are all using 既 as its component, so their writing style should be
the same. Mixing different writing styles for the same component in one font
gives an impression that things are not organized properly.
3) ToE adapts 既 as the standard form (既 is also the standard form in Hong
Kong).
4) Yet the standardization body in Taiwan chose 口旣 but not 口既 for the
word 嘅.
The most ironic thing is that they are not using this word actually (it is a
"rarely used character" as indicated in their website) and they made such a
strange decision. We people in Hong Kong are using this word every minute every
day and we have no say on it (嘅 doesn't exist in Big5 but in Big-HKSCS as it
is a frequently used character in HK).
I am not blaming you guys for this because it isn't your fault, but I still
want to see if I can do something to clean this up:
1) I can't comment on how it is written in Japan/Korean (although I doubt if
they use this word differently[1]). But for the TC version of the font, I heard
that the writing style complies to the standard by the Ministry of Education.
So I would like to ask is "嘅" required to be written as "口旣" by the MoE?
I highly doubt it because it is so rarely used in Taiwan. If it isn't, is it
possible for you to give special treatment to this word so that its form is
harmonic to other glyphs with the same component? This will be beneficial to
Cantonese speaking community like Hong Kong and will have nearly zero impact to
other Traditional Chinese community (because, well, they seldom use it).
2) If you cannot give special treatment to this word because your policy is to
following the standard no matter it's right or wrong, I would be grateful if
you could give me any hint on how can I report this issue to the consortium or
organization or any body who is responsible for this matter.
Thanks again.
[1] "嘅" doesn't exist in a Japanese dictionary. AFAIK it is not a Kanji used
in Japan:
http://dictionary.goo.ne.jp/search.php?IE=UTF-8&MT=%E5%98%85&kind=all&mode=0&SH=
1&from=gootop
Korean treats 嘅 as a variant of 慨 as from the Korean Hanja dictionary :
http://hanja.naver.com/search?query=%E5%98%85
Original comment by e.ta...@gmail.com
on 17 Jul 2014 at 2:52
I think the point is, we decided to use TW standard on Big5-HKSCS if a
character is in Big5-HKSCS but outside Big5 range in order to make TC
consistent. Usually it's not a big problem to HK people. However, 嘅 differs
much in component level not stroke level and it is a high frequency character
in HK (by this bug report).
Given 嘅 is outside Big5 and rarely used, we may need to reconsider to use
glyph 口既 as exception for practical reason.
Original comment by k...@google.com
on 17 Jul 2014 at 11:19
The MOE, in general, uses the KangXi form for words that exist in Plane 3, and
refuse to correct inconsistencies of glyph components to align them with Plane
1/2.
I think the standards body here cannot be referred to, especially due to their
indifferent attitude in dealing with words they rarely use (but are used in
other Tradtional Chinese using communities). Arguably, the MOE has no right to
decide how to write characters that they discourage using, and for the sake of
visual consistency, 口既 should be used.
I look forward to Google overturning the decision for this particular character.
Original comment by henry.fa...@gmail.com
on 17 Jul 2014 at 12:31
To gauge the use of 嘅 in Hong Kong, Taiwan, Japan and Korea, albeit
inaccurately and not too scientific nor too representative, I did the following
Google searches and read the number of entries.
- Hong Kong: https://www.google.com.hk/#q=%22%E5%98%85%22+site:.hk 2,110,000
- Taiwan: https://www.google.com.tw/#q=%22%E5%98%85%22+site:.tw 69,100
- Japan: https://www.google.co.jp/#q=%22%E5%98%85%22+site:.jp 32,300
- Korea: https://www.google.co.kr/#newwindow=1&q=%22%E5%98%85%22+site:.kr
62,600
Besides the sheer difference in order of magnitude on the counts, the top
entries of the search results in Taiwan, Japan and Korea are mostly Cantonese
texts. That's an indication that "嘅" is indeed mostly used as a Cantonese
character but not much in other region.
With this data and the unicode glyph for Hong Kong (H-9DEF), now I tend to
agree that it makes sense to use the Hong Kong glyph for 嘅.
Are there other characters in the same class, namely
1. Traditional Chinese frequently used in Hong Kong but nowhere else, and
2. the glyph does not conform to Hong Kong standard, and
3. it is outside of Big5 character set?
I think it's worth listing them for consideration all together.
Original comment by ping...@google.com
on 17 Jul 2014 at 4:48
For now I can only report when I find something doesn't seem right. It would be
better if there is a checklist, but I don't have one.
That said, I just spotted another word with the same issue, which is U+7740
(着)。
1. Historically, 着 is a variant of 著 (U+8457, which means [a] famous or [b]
to wear).
2. In Taiwan, people don't use 着 as 著 is always the preferred form. 着
isn't in the Big5 table.
3. In Hong Kong, we write 著 for "famous", and write 着 for "to wear". It is
included in HKSCS.
The problem with 着 is that its upper component isn't consistent with other
word like 差.
It is using a component from Simplified Chinese, which is one stroke less than
the Traditional Chinese component.
"目" is composed of 5 strokes, so the T.C. vesrion of 着 adds up to 7+5 = 12
strokes:
http://www.edbchinese.hk/lexlist_en/result.jsp?id=2757&sortBy=stroke&jpC=lshk
So I would be more appropriate for this word to use the glyph same as the JP/KR
one. Thanks.
Original comment by e.ta...@gmail.com
on 18 Jul 2014 at 3:43
Attachments:
Re-attach the comparison image
Original comment by e.ta...@gmail.com
on 18 Jul 2014 at 3:44
Attachments:
Attached is the unicode chart for U+7740. Indeed the Hong Kong glyph is the
same as Japan or Korea, but different from China or Taiwan.
I did the same Google search count exercise for U+7740.
Hong Kong: https://www.google.com.hk/#q=%22%E7%9D%80%22+site:.hk 3,290,000
Taiwan: https://www.google.com.tw/#q=%22%E7%9D%80%22+site:.tw 4,030,000
Japan: https://www.google.co.jp/#q=%22%E7%9D%80%22+site:.jp 1,090,000,000
Korea: https://www.google.co.kr/#q=%22%E7%9D%80%22+site:.kr 68,300,000
This data seems to hint that the use of U+7740 isn't dominated by either Hong
Kong or Taiwan. With that, it's hard to justify using one glyph or another in
this font.
Original comment by ping...@google.com
on 22 Jul 2014 at 8:28
Attachments:
[deleted comment]
[deleted comment]
(Sorry, I kept using the wrong account, ignore previous two comments sent via
email)
According to the principles of standard form from Taiwan MOE
("國字標準字體研訂原則"), 「上『羊』之中筆分成兩筆」
"The 羊 at the top should be broken into two strokes", the word 着 should
have used the H-source / J-source / K-source form
(http://www.edu.tw/files/site_content/M0001/biau/f61.htm?open).
着 is deemed a variant of 著 by Taiwan MOE
(http://140.111.1.40/yitia/fra/fra03506.htm #a03506-003) (in fact, the H-source
/ J-source / K-source glyph is used, while the current T-source glyph is
nowhere to be found), thus it is assigned to CNS11643 Third Plane 罕用字
(Rarely-used Characters)
(http://www.cns11643.gov.tw/AIDB/query_general_view.do?page=3&code=3757).
However, CNS11643 refuses to correct representative glyphs for these
"rarely-used characters" in Plane 3 that deviate from the Taiwan MOE rules on
the basis that they are rarely used anyway. Thus, the CNS11643 representative
glyph itself is not always consistent, the same situation as 嘅.
According to MOE standard, the word 著 must be used instead of 着. Yet in
Hong Kong 著 and 着 are, (at least in real life) nearly always used for
different contexts. From a practical point of view I also see no reason to
adhere to T-source glyph when this character is actively discouraged, while the
word is in use a lot by Hong Kong.
Please note that in Big5, the character 着 has not been typeable until after
the invention of HKSCS. However, in most Hong Kong based websites, the uptake
of HKSCS has not been very high. It is until recently have sites shifted to
utf-8. Most of the older content will contain 著 instead of 着 due to mapping
rules of Big5. Meanwhile, many new infotainment Taiwan sites directly copy
content from mainland Chinese sites and fail to convert the 着 to the MOE
mandated 著. Not to mention the population of Taiwan is a multiple of Hong
Kong. Thus the statistics from Google search are basically pointless to show
that Hong Kong's use and Taiwan's use is similar.
Original comment by henry.fa...@gmail.com
on 23 Jul 2014 at 3:12
Re #11:
Unlike 嘅, 着 is not exclusively used in Hong Kong because it is not only
used in Cantonese.
But the underlying problem is the same: different glyph being used for the same
component in different words in Traditional Chinese version of the font. The
glyph of the upper component of 着 is consistently used across words like 差,
羌, 羞 in the Japanese and Koren version. There is no controversy in the J/K
font. Simplified Chinese tries to save one stroke so it modified how the
component is written, but it is still consistent in the aforementioned words.
So, also no problem in the SC font.
However, for the Traditional Chinese version, the SC version of the glyph is
used for 着. So it appears that the upper component of 差羌羞 is different
than that of 着. But it isn't true. Proof:
a) Table of Basic Components for Song Style (Print Style) Chinese Font in Hong
Kong [1]
b) Education Bureau of Hong Kong [2]
So, the glyph difference makes it fail to conform to HK standard.
And the underlying cause of this problem appears to be the same (as that of
嘅): while 著着 is treated as different words in HK, TW chose 著 as the
standard form. Thus 着 isn't in the Big5 character set, and MoE didn't care
about how it should be rendered in Unicode.
Therefore, I believe that the glyph of 着 in TC should be modified to follow
the glyph in J/K.
[1]
http://www.ogcio.gov.hk/tc/business/tech_promotion/ccli/terms/doc/c_gsect4.pdf
[2]
http://www.edbchinese.hk/lexlist_en/result.jsp?id=2757&sortBy=stroke&jpC=lshk
Original comment by e.ta...@gmail.com
on 23 Jul 2014 at 3:48
Attachments:
I would like to add that the Google search result of "着" in .tw website may
not reflect how frequent it is used in Taiwan. I checked the search result in
the first page, and found only 2 of them (out of 10) are really websites from
Taiwan. Among this two, one result comes from a book store showing the title of
a Simplified Chinese book. The only genuine Taiwan website using the word "着"
is Apple Daily Taiwan [1]. And I think it is a very special case because the
word 着 is used in the song name "你敢有聽着咱的歌" which is not a
Mandarin song but a song in "台語" [2] (台語 has its own presentation words
different from the standard).
And even if Taiwanese is using it it does not change the fact the the component
of 差羌羞着 should be the same. I suspect if people in Taiwan would agree
to write 着 in the Simplified Chinese way.
[1] http://www.appledaily.com.tw/realtimenews/article/new/20130803/236328/
[2] http://en.wikipedia.org/wiki/Taiwanese_Hokkien
Original comment by e.ta...@gmail.com
on 23 Jul 2014 at 4:03
regarding 嘅, glyph for Traditional Chinees should be changed to Noto's
Simplified Chinese glyph (i.e., 口 + 既)
regarding 着, it sounds like a bug should be reported again Taiwan MoE,
instead. pinyeh, kcwu, please confirm.
Original comment by xian...@google.com
on 7 Aug 2014 at 9:22
With regard to 着 (U+7740), its Traditional Chinese (Taiwan) source is CNS
11643 Plane 3 0x3757, and the representative glyph in the 1992 and 2007
versions agree, and are what we are currently using for Traditional Chinese.
When I consult the Taiwan MOE glyph standards, this character's index is
408938, and is tucked away among the variants (異體字), and agrees with CNS
11643. To me, its form seems intentional. But, because it is outside the scope
of Big Five, but in Hong Kong SCS (0xFED3), the form that is being requested is
the same as the Japanese form, so a remapping can easily take care of this.
Original comment by ken.lu...@gmail.com
on 11 Aug 2014 at 10:36
Original comment by xian...@google.com
on 5 Sep 2014 at 9:14
Original issue reported on code.google.com by
e.ta...@gmail.com
on 16 Jul 2014 at 5:52Attachments: