NSoiffer / MathCAT

MathCAT: Math Capable Assistive Technology for generating speech, braille, and navigation.
MIT License
63 stars 35 forks source link

Offer expression as a tree #304

Closed bondolo closed 3 days ago

bondolo commented 1 month ago

It is currently very frustrating to iterate over an expression to visit each of the sub-expressions and get the speech and braille for the sub-expressions. I have looked at the demo program and the mechanism it uses of setting preferences before getting the text is cumbersome. The current navigation API doesn't really work if you are using it with an unknown expression as it doesn't expose what actions are available at the current position.

Presenting a tree of the navigation nodes would make traversal easier.

My ultimate goal is to, given a valid MathML expression, build a tree of all navigation nodes and for each node in this tree have braille and speech pre-computed. In my usage it is not practical to generate the speech and braille on-the-fly; I need to capture it all-at-once.

NSoiffer commented 1 month ago

I might be misunderstanding what you want, but isn't the MathML tree what you are asking for? Given that tree, you can iterate through it asking for the speech and braille.

Specifically, using the interface:

  1. Set the rules directory
  2. Send the mathml and get back the cleaned up MathML. This will have an id for each element in the tree.
  3. For each node in the cleaned up tree a. call set_navigation_node with the id of the node b. get the speech by calling do_navigate_command("ReadCurrent") c. get the braille by calling get_navigation_braille()

These are the Rust function names. For python, they are CamelCase (e.g., SetNavigationNode).

You may want to skip some nodes such as mtd (math table data) as there isn't any difference in speaking them and speaking their content. Also, depending on how you navigate, you may want to ignore mrows. In MathCAT's "enchanced" mode, navigation follows the expression tree, but in "Simple" and "Character" modes, mrows are essentially flattened and structure is ignored. For simple mode, only 2D structures such as mfrac and msub along with the leaves are navigation points.

NSoiffer commented 1 month ago

I should probably add that do_navigate_command accepts many different strings. Probably the relevant ones to you are the following (copied from the code comments):

/// * Standard move commands:
///   `MovePrevious`, `MoveNext`, `MoveStart`, `MoveEnd`, `MoveLineStart`, `MoveLineEnd`
/// * Movement in a table or elementary math:
///   `MoveCellPrevious`, `MoveCellNext`, `MoveCellUp`, `MoveCellDown`, `MoveColumnStart`, `MoveColumnEnd`
/// * Moving into children or out to parents:
///   `ZoomIn`, `ZoomOut`, `ZoomOutAll`, `ZoomInAll`
/// * Undo the last movement command:
///   `MoveLastLocation`
/// * Read commands (standard speech):
///   `ReadPrevious`, `ReadNext`, `ReadCurrent`, `ReadCellCurrent`, `ReadStart`, `ReadEnd`, `ReadLineStart`, `ReadLineEnd`
/// * Describe commands (overview):
///   `DescribePrevious`, `DescribeNext`, `DescribeCurrent`

Moving left/right are only feasible if there are left/right siblings. You can't currently moving inside of a leaf element (something I still need to implement if there are multiple chars such "1234" inside the leaf element). So ZoomIn won't work there and ZoomOut won't work if the parent is math (the root of the tree). Similar ideas apply to moving by cells.

This provides another means of moving around. You can get a node id after a move. With it, you can compare it to the starting node id and if it didn't change, that's another way to see that the move command didn't do anything.

bondolo commented 2 weeks ago

Thank you. The approach of parsing the cleanedup MathML and then visiting every node is working. It does seem a little odd though and the reading is not quite the same as using the traversal APIs. Some of the context, "In enumerator", is not included. I might work around this by generating the traversals for each node rather than calling `set_navigation_node" unless you can suggest a simpler alternative.

NSoiffer commented 2 weeks ago

The speech string should include "in numerator", etc. There is a preference NavVerbosity that controls echoing of the command (e.g., "zoom in"), but it doesn't affect location information such as being in the numerator.

If you haven't already seen this, I have a MathCAT demo site. After generating speech from the input, clicking in the "Displayed Math" box allows you to arrow around. Although the demo is just passing key strokes to MathCAT, those get mapped to the commands and so should give you the same results as if you called the API.

NSoiffer commented 3 days ago

Closing due to lack of activity. Please re-open if there is still a problem.