fabd / kanji-koohii

A web application to help Japanese language learners remember the kanji.
https://kanji.koohii.com
GNU Affero General Public License v3.0
227 stars 21 forks source link

Dictionary : fix sorting of results to account for "alternate" readings #126

Open fabd opened 7 years ago

fabd commented 7 years ago

Bug

Dict lookup for 明日 should display the あした reading (currently it shows みょうにち).

Background

The SQL query sorts dictionary lookup results with the "priority" field. Most of the time that works alright. In a few cases this cause issues.

For 明日, the third less common reading みょうにち comes first because priority uses 3 bits (ichi1, news1, nf05), thus numerically greater than the first reading あした which uses only "ichi1".

<ent_seq>1584660</ent_seq>
<k_ele>
<keb>明日</keb>
<ke_pri>ichi1</ke_pri>
<ke_pri>news1</ke_pri>
<ke_pri>nf05</ke_pri>
</k_ele>
<r_ele>
<reb>あした</reb>
<re_pri>ichi1</re_pri>
</r_ele>
<r_ele>
<reb>あす</reb>
<re_pri>ichi1</re_pri>
</r_ele>
<r_ele>
<reb>みょうにち</reb>
<re_pri>ichi1</re_pri>
<re_pri>news1</re_pri>
<re_pri>nf05</re_pri>
</r_ele>
<sense>
<pos>&n-t;</pos>
<gloss>tomorrow</gloss>
</sense>

Solutions

fabd commented 7 years ago

Another potential example for 田 :

水田 (nf07) shows up before 田園 (nf11).

I'm not sure. jisho.org doesn't show 水田 when searching for 田. But JMDICT's description of nfxx field indicates that nf07 is a higher frequency of use than nf11.

nfxx: this is an indicator of frequency-of-use ranking in the wordfreq file. "xx" is the number of the set of 500 words in which the entry can be found, with "01" assigned to the first 500, "02" to the second, and so on.