Optimization of search results and result view

io8titan commented 7 years ago

Yomichan's search is too loose and shows a lot of irrelevant search results, which sometimes match a part of the pronunciation, but sometimes not even that. For example with only JMDICT enabled, a search for ビーチバレー yields: ビーチバレー、ビーチ、ビー、B、b、美、微、尾、びる The first three results still make sense, but everything afterwards only (partially) matches the pronunciation and should not be displayed. Could you either exclude those irrelevant results or at least provide an option for a "strict" search to decrease the amount of search results?
Yomichan's result window is too small and the used font too big, therefore a lot of scrolling is necessary to even see only a few results. Yomichan (result is on 5 "pages"): Chrome rikaikun (same result can be seen at a glance, no scrolling necessary):

FooSoft commented 7 years ago

Thanks for the report -- some thoughts:

How do you determine what is an irrelevant result? Yomichan does not do any lexical analysis of text, and as you know, there are word separators in Japanese. Yomichan grabs a length of text starting from the cursor, and works backwards trying to find matches for all the substrings. Most of the time you are after the longest match, but this is not always the case -- sometimes you really just want to look up the definition for the character that first appears after the cursor. Since many characters have the same reading, you get a lot of results. The shortest result is way at the bottom and the closest match is at the top, so I'm not sure I see what the issue is considering that you don't have to scroll through all of the results if the one you want is already at the top.
There is a setting in options that allows you to change the size of the Yomichan popup window, and if you are using Chrome you can just resize the popup directly (Firefox does not support this behavior properly). There is currently work in progress to reduce the amount of space used to display definitions, you can check out #84 to see the progress on that.

io8titan commented 7 years ago

I would define relevance as follows (for the examples I assume a selected string of "ふ頭"):

Exact match of selected string. Example: Only show results for ふ頭 and nothing else.
Only match selected string but allow substitutions of Katakana/Hiragana with Kanji. Example: Also show results for 埠頭 but not for 不当.
Exact match of substrings. Example: Also show results for ふ.
Match substrings and allow substitutions of Katakana/Hiragana with Kanji. Example: Also show results for ふ、府、不 etc.

While substring matching might be useful in some occasions, I think it only occupies screen estate and/or forces people to scroll through lengthy lists of mostly irrelevant results. Why not give the option to further adjust the selected string (for example using Ctrl+[arrow key]) and therefore allowing the user to refine the results if just a substring shall be looked up. In this case substring matching would not need to be enabled of course.

While testing I also found that the first result for a Hiragana/Katakana word can be a Kanji with the same reading, although the exact match should come first. Example: If の is selected, the first result will be 野 although a definition for の (the particle) would be more suitable as first result.

non-e-moose commented 7 years ago

While testing I also found that the first result for a Hiragana/Katakana word can be a Kanji with the same reading, although the exact match should come first. Example: If の is selected, the first result will be 野 although a definition for の (the particle) would be more suitable as first result.

I believe that it's in the works (see issue #85:

This is just something I have implemented in my own project. I've found that when looking up definitions for a word that is written in kana only, the definition in kana only is usually the one I want. It wouldn't take precedence over the length of the term, i.e. you wouldn't get 「か」 first when scanning 「かった」.

I'm also looking forward to it, but we have to be patient.

I would define relevance as follows (for the examples I assume a selected string of "ふ頭"):

Exact match of selected string. Example: Only show results for ふ頭 and nothing else. Only match selected string but allow substitutions of Katakana/Hiragana with Kanji. Example: Also show results for 埠頭 but not for 不当. Exact match of substrings. Example: Also show results for ふ. Match substrings and allow substitutions of Katakana/Hiragana with Kanji. Example: Also show results for ふ、府、不 etc.

That's how it's handled already, apart from also matching conjugations, e.g. masu stem (a very important feature that I hope never gets removed) and not showing you all ways to kanjify words (resolved according to #84, now only needs to get pushed). Also, since results are ordered by substring length, if the word you need is the longest substring (i.e. the string itself), which is true in most cases, then you don't have to scroll all the way down to the shorter substring results (so it's not really a problem that they are there, just don't scroll down) and if you actually need a shorter substring then you can scroll down to see the shorter substring results (the time difference of selecting substring length as compared to scrolling down should be negligible, especially since you already have your hands on the mouse, as you just pointed to a word and anyway in my experience you almost always want the longest substring).

seanblue commented 6 years ago

@FooSoft Exact matches aren't prioritized if the word is only in kana. For example: カイロ The actual definition using katakana shows up 4th. The 3rd definition is also correct since it is the kanji version and says that it is written in kana alone.

When you have a word in kana, shouldn't you prioritize exact matches, then kana only and kanji with uk matches, and then by popularity?

FooSoft / yomichan

Optimization of search results and result view #90