MartinThoma / hwrt

A toolset for handwriting recognition
MIT License
70 stars 18 forks source link

Structural analysis #24

Open MartinThoma opened 9 years ago

MartinThoma commented 9 years ago

This is called "structural analysis" in expressmatch and "spatial layout of symbols (e.g. adjacent, superscript, etc.)" in CROHME.

As soon as a segmentation is given and the single symbols are recongized, it is important to figure out in which way they are put together. For example, the recognized symbols x and y could be xy, x^y and x_y.

All symbols can have a subscript (a_{?}) and / or an superscript (a^{?}).

Some 'symbols' are nested, like

Some symbols modify the meaning of subscript / superscript:

One important decision is how to store the geometry, given a list of symbols, e.g. ['\sum', 'i', '=', '0', 'n', 'i', '2'].

A possible format would be a custom JSON:

{"symbol": 0,
 "down": {"symbol": 1,
          "right": {"symbol": 2,
                    "right": {"symbol": 3}},
          },
 "top": {"symbol": 4},
 "right": {"symbol": 5,
           "superscript": {"symbol": 6}}
}

So we start with a symbol on the "main line" at the most left position. Each symbol can have the attributes "top", "bottom", "right", "subscript", "superscript" which will be again a symbol.

Of course, this does not capture important cases like:

An alternative would be MathML. However, MathML seems not to be optimal (see Does one rendering have multiple MathML expressions?)

MartinThoma commented 8 years ago

Figure 4 of "Mathematical expression recognition: a survey" shows that the structural analysis cannot be done locally