JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Headword ordering and the 常用漢字 #36

Open JMdictProject opened 2 years ago

JMdictProject commented 2 years ago

Our approach to ordering the surface forms in the "Kanji" field (and also in the Readings) has been to base it on frequency-of-use, drawing on metrics like n-gram counts upon which to base the decision. This seems to have worked pretty well.

A correspondent has pointed out to me that entry 1596430 (ひまご/そうそん) could have an ordering issue. The Kanji field currently has: ひ孫; 曾孫; 曽孫 [ichi2] ; ひい孫 which reflects the n-gram counts: ひ孫 65158 曾孫 51751 曽孫 2129 The major JEs (GG5, etc.) all have 曾孫 as the sole surface form as do my copies of 広辞苑 and 大辞林.

In 2010 曽 was designated as a 常用漢字 (previously it had been a 人名用). The two kanji 曾 and 曽 have always been regarded as variants of each other and if one was to be added to the 常用 list it sort-of makes sense to choose 曽as it's simpler and looks a bit newer. This move does raise the question as to what goes in dictionaries. The newest online editions of the デジタル大辞泉 seem to have changed to using 曽 alone for terms like 曽孫 and 未曽有.

For ordering terms in JMdict we probably should decide whether we give preference to forms containing 常用漢字 even if they are not the most common form. It would be a reasonable policy, and in cases like compounds containing 曾 and 曽 the n-gram frequencies go both ways, e.g. for 曾祖父/曽祖父 and 未曾有/未曽有 the 曾 versions are more common.

Any views on this?

Marcusjmdict commented 2 years ago

I suppose it might be OK to list 曽孫 before 曾孫, if that's the official 常用漢字 now, but I think ひ孫 should still lead. That's still what my Windows 10 IME firsts suggests when I type "ひまご" (曾孫 does come in 2nd place before 曽孫 in 3rd place).