WICG / handwriting-recognition

Handwriting Recognition Web API Proposal
https://wicg.github.io/handwriting-recognition/
Other
72 stars 17 forks source link

Text direction needs to be taken into account #4

Open r12a opened 3 years ago

r12a commented 3 years ago

Not only will the recogniser need to take into account the language, but it will be unable to decipher the text unless it understands the glyphs it recognises proceed from right-to-left or left-to-right or vertically top-to-bottom with lines stacked LTR or RTL.

This includes orthographies that are generally written in one direction, but that have embedded text that runs in the opposite direction, and sometimes embedded text within that.

To some extent the recogniser will be able to apply the Unicode bidi algorithm to reverse engineer the logical character sequence, but in other bidirectional cases this will not be sufficient. Also it would probably be beneficial to indicate for the recogniser the overall scanning direction for the text being entered, for which it may be useful to apply a directional label, in a similar way to how one does this for language.

wacky6 commented 3 years ago

Can you give some examples of how these mixed direction texts are written? Here I mean the actual process of how they are written (e.g. which character, which strokes are written first).

We didn't expect this to be a problem though. The assumption we made is that handwriting follows the natural flow of speech. In other words, we didn't expect the characters to be written in reverse (relative to their speech / interpretation direction). For example, we didn't expect "hello" to be written in "elloh" order).

r12a commented 3 years ago

I grabbed some examples from Wikipedia home pages.

First example. Unidirectional text, but the recogniser has to scan from right to left.

Screenshot 2020-11-27 at 11 25 56

Second example. Numbers and Latin text run LTR within the overall RTL flow. People writing the text tend to leave a gap and write the LTR text from LTR. They don't write the numbers or the Latin text backwards.

Screenshot 2020-11-27 at 11 26 38

Note, btw, that in the example just above, the parenthesis on the left is U+0029 RIGHT PARENTHESIS, and the one on the right is U+0028 LEFT PARENTHESIS. These are mirrored characters, whose glyph in typed text is established only when the directional context is known. The recogniser will also need to assign the glyph to a code point depending on the current base direction.

Third example. Overall LTR sentence has RTL text with embedded LTR text in it. I expect that 'W3C' would probably be the last 3 code points written and stored in memory once the text has been recognised.

Screenshot 2020-11-27 at 11 35 52

To be honest, it can be difficult to know where the boundaries are for the changes in base direction here, though in this example the quote marks help. I don't know how this is done in practice, i'm just flagging up that it will be necessary.

When it comes to speech, there is no flip-flopping of direction involved, and in fact in memory all code points are also arrange in one logical, unidirectional sequence. The changes in direction are only a feature of the written text. Unfortunately for you, that's what you're starting from.

wacky6 commented 2 years ago

Sorry about the delay. I forgot to mention you in https://github.com/w3ctag/design-reviews/issues/591#issuecomment-840366071

@r12a

Let's continue the discussion here.

image

WDYT about a direction hint to disambiguate the main direction here? This will help telling "82: Score" and ("Score: 28" or Score: 82" apart (especially for rule-based recognizers).

For distinguishing between "Score: 28" and "Score: 82" (esp. rule based ones). I imagine the recognizer can determine the script of each word, use script's LTR or RTL to decide. In the above case, "Score:" is Hibrew, and "82" is Latin. With the presence of direction hint, in memory string starts with "Score:", followed by "82".

For machine learning based recognizers (the ones we currently have), handwriting "Score: 82" is part of their training dataset. The Hibrew recognizer will learn from the dataset and output characters in the correct in-memory order (i.e. direction hint is unnecessary). As for how it knows the right order, we don't know (hence why it's ML based).