chise / ids

Mirror of https://gitlab.chise.org/CHISE/ids
https://www.chise.org/ids/
12 stars 3 forks source link

Ids fix #3

Closed mumchristmas closed 3 years ago

mumchristmas commented 3 years ago

Fixed 8 unicode ide data.

mumchristmas commented 3 years ago

(⿸⿰飠𠂉x -> ⿰𠦝⿱𠂉x): Don't change functional structures to apparent structures. [Note] CHISE can generate these apparent structures from functional structures automatically if the source IDS are described as functional structures.

The "⿸⿰飠𠂉x" goes against the rules of traditional Chinese calligraphy. The left part (such as "飠") has strong consistency and should not be considered part of the surrounding structure.

For examples in current version of ids: U+98ED 飭 ⿸⿰飠𠂉力 U+9929 餩 ⿰飠⿱𠆢亥

Why the different?

Infact, only 2 glyphs matched "⿸⿰飠xx" model against 112 glyphs with the "⿰飠x" model in unicode basic zone. Sorry to say, but the 'surrounding way' just made simple thing complicated again.

chise commented 3 years ago

In the case of U+9929 餩, it may be used as a variant of 餘. In this case, ⿱𠆢亥 seems a variant component of 余. So, in this case ⿱𠆢亥 may be a functional component. However, 餩 may be also used as a variant of 骸. In this case, ⿸飤亥 is the functional structure. Namely 餩 subsumes two different morphemes (different characters collide in the same glyph) and their functional structures are different. In that case, it is better to chose the functional structure more similar to apparent structure, I think.

In the case of U+98FE 飾, the situation is ambiguous. 説文 said 「㕞也。从巾从人,食聲」. http://humanum.arts.cuhk.edu.hk/Lexis/lexi-mf/search.php?word=%E9%A3%BE https://www.chise.org/est/view/image-resource/rep.id=zinbun...toho...A024...A0240274...$.zoom-xywh=3024,1439,225,1250 ⿰飠𠂉 is a variant component of 飤, and it is a variant character of 食. https://www.chise.org/est/view/image-resource/rep.id=zinbun...toho...A024...A0240184...$.zoom-xywh=1527,1956,225,750 http://humanum.arts.cuhk.edu.hk/Lexis/lexi-mf/search.php?word=%E9%A3%A4 Small-Seal form of 飾 seems ⿵飤巾, so I thought ⿰飠𠂉 as the phonetic component of 飾. However, in this case, ⿰飠&CDP-8DB3; is also acceptable. (In the case of U+98ED 飭, the situation is similar.) [Note] Based on productivity of components (calculated using CHISE dataset), accuracy of ⿸𭤨巾 is 87 and accuracy of ⿰方⿱𠂉巾 is 0.5. In this sense, ⿸𭤨巾 is suitable as the functional structure of 飾. In this sense, ⿰飠𠂉 should be regarded as the functional component. http://www.fluxus-editions.fr/gla5-mori.pdf#page=7

In the case of U+4E7E 乾, its functional structure is 从乙倝聲.

mumchristmas commented 3 years ago

Wow. I have to say that as a Chinese, I still don't know enough about the history of traditional Chinese characters. And as a font engineer, I think more about data consistency and structure. Easy to simplify or even ignore the etymology. Your idea comes from a new angle, and worth thinking about. I will reconsider the subsequent amendments. You and your university have completed an amazing job. Happy New Year of the Ox!