chinese word selection - Githubissues

We assume that when a user selects a word on the page to lookup, it means it is one they don’t recognize and we proceed to use the surrounding forward context (currently up to 5 characters) to lookup the word.

E.g. given the following 4 character sequence

是故君子

If the user selects 是, we show them the meanings for 是 and 是故. If the user selects 故 we show them the meaning for just 故 If the user selects 君 we show them the meaning for 君 and 君子 If the user selects 子 we show them the meaning for just 子

For other languages, on the desktop, users double-click on the words to select the text. On mobile devices, it’s a double-tap.

For chinese, a more sophisticated approach is needed.

The browsers have their own algorithms for detecting the text under the double-click, and at least for modern chinese words, appear to automatically expand the selection to the two-character word. E.g. in the above text, if I double click on either 是 or 故 the browser automatically assumes I want to select the compound word 是故.

I am assuming the browsers are probably pretty intelligent about this for modern chinese, and build their algorithms based upon big data. I.e. for this example, perhaps if 是appears before 故 the only possible meaning is 是故.

So, on the one hand, I hate to presume that we can do better than the browsers at identifying the underlying word. But on the other, the browser are less likely to be able to do this properly for ancient chinese.

So far we have implemented an approach that enables the user to choose mouseover as the selection option for Chinese:

wordselection

How sensitive the application is to a mouse movement can be controlled so far by the following two settings:

_MouseMoveDelay defines how many milliseconds to wait after a user stops moving the mouse to proceed with the lookup. We believe this is needed because otherwise we don't really how to distinguish an intentional mouse movement over a word they want to lookup from one that is a movement to get to a word they want to lookup. But it makes the interface feel a little unresponsive it is set too high.

MouseMoveAccuracy defines how many pixels to the left and right of the mousepointer we should look to try pickup the underlying word. We believe this is needed because the level of fine grained control a user has over a mouse is variable and a user may point the mouse near but not quite on the word they want to lookup. But as currently set at 10 pixels it may be too high. We need to experiment to see if this will work.

Additional issues with the mousemove handler that we have not yet addressed include too many false positives of word selections. E.g. for a page with both chinese and non-chinese text, it's nonsensical to present the user with a popup showing a "word not found" message when the mouse rests over what is obviously not chinese text. We can do better than what we do now, certainly.

Other possible approaches could be to have the user to highlight words they want to lookup with their mouse, or to respond to a single click on a character.

We believe we need some design input and user feedback on optimal approaches to selecting and working with chinese characters before investing much more time in development on this.

alpheios-project / alpheios-core

chinese word selection #149