Use Unihan data to guess which meaning you meant

batterseapower / pinyin-toolkit

A plugin for the Anki Spaced Repetition System (http://ichi2.net/anki/)

http://batterseapower.github.com/pinyin-toolkit/

39 stars 14 forks source link

Use Unihan data to guess which meaning you meant #130

Open batterseapower opened 15 years ago

batterseapower commented 15 years ago

CEDICT can contain several entries for the same characters, and there is no indication as to which meanings / readings are more frequent. The Unihan database organises readings by frequency, so we can use that to help us guess which CEDICT meaning to pick.

batterseapower commented 15 years ago

CJKlib does not appear to give us frequency info right now :(

cburgmer commented 15 years ago

Hi, that's true. Any ideas how to implement that in a "objective" way? Frequency depends on the source, and I don't want to push any particular frequency data. Maybe a schema for different frequency sources could be implemented, similar to CharacterDomains, which can be easily extended by own data.