NSoiffer / MathCAT

MathCAT: Math Capable Assistive Technology for generating speech, braille, and navigation.
MIT License
53 stars 32 forks source link

Add Dutch math notation #231

Open dkager opened 8 months ago

dkager commented 8 months ago

In the Netherlands the math notation is designed by Dedicon. Dedicon is the only national organization that adapts educational books to be accessible to blind and visually impaired readers. Teh math notation is very similar to AsciiMath. Hence, it is neither a true braille code nor a true spoken notation. Probably more of the former than the latter actually. Because it is the dominant notation in the Netherlands, I think it is worth implementing. The question is: how? Although the notation's documentation is in Dutch, you can see some typical examples here. Besides AsciiMath-like rules there are also symbol replacements, e.g. inf for infinity, ~a for Greek letter alpha, etc. A key point is that the notation defines text and not braille dots, e.g. inf and not ⠊⠝⠋. In schools, students will most likely be reading the notation in their preferred braille table (mostly US/American, German or the more recent Dutch 8-dot standard). While some people have created speech rules, e.g. speak / as divided by, this is not how Dedicon intended the notation.

For implementation in MathCAT I see two possible routes:

  1. Add multiple braille rule files, each based on its own braille table.
  2. Add braille rules using plain text as opposed to Unicode braille.

I would be interested to hear your thoughts on this subject.

NSoiffer commented 8 months ago

Good timing. Coincidentally, I was exploring a site to links to braille codes and was in the process of writing up an issue (#232). The link to the Dutch site doesn't include the math spec, so I appreciate your link.

To make sure I understand the situation, the Dutch rules for brailling a notation are the same (generate some ASCIIMath-like notation), but the difference is that sometimes a Latin (extended) alphabet is used (a, b, /, maybe ü, etc), sometimes 6-dot braille is used, and sometimes 8-dot braille is used. Is that correct? If so, then I think adding an option for the user to choose (along with choosing a braille code) is best. The unicode.yaml file would then test the option and include an appropriate translation file. I'm not 100% certain this works, but if it doesn't, I can fix things to make sure it does work. This allows the implementation to share the common translation rules but have different character translation rules.

dkager commented 8 months ago

The situation in the Netherlands is that Dedicon produces the accessible text. For math it is only read digitally, not embossed as braille on paper. So the Dutch math notation is meant for digital use, where the screen reeader translates the ASCII into braille and/or speech (mostly braille as I wrote earlier). So ideally MathCAT should translate MathML --> ASCII. However, in the Nemeth rules and Unicode replacement files I see only braille dots (Unicode braille). If unicode.yaml also accepts plain ASCII that would be a good solution I think. As for embossed braille, Dedicon only produces this for primary education, i.e. very basic math. Unfortunately, there is no formal specification for math in embossed braille in the Netherlands. I suppose that what Dedicon needs for the digital math translation is more akin to MathJAX. However, since MathCAT is appearing in screen readers and EPUB/DAISY readers there is a very good case for implementing it there as well. I might be able to work on that in 2024.

NSoiffer commented 8 months ago

Implementation is definitely simpler if you only want ASCII output. If you look at any of the speech files, you'll will see that is what is in the unicode.yaml files. For example, "⊆" will turn into something like "subset of or equal to". MathCAT uses the same basic system for generating speech and braille, although braille has an extra post-processing step that does some cleanup.

I'm happy to work with you on the braille math translation. Writing the Unicode file should be straightforward for you, but the rule file might not be as obvious.

I suppose that what Dedicon needs for the digital math translation is more akin to MathJAX

Apologies for drawing this out, but I am not following your meaning. What is it that MathJax does that you can't do with MathCAT?

I suspect that if you want to also have a Dutch speech translation, then you may want to support two speech styles: one that generates "normal" speech and one that generates "Dutch braille" speech. Taking the first integral as an example, the speech for each might be (in English): Normal: integral from 1 half pi to pi, sine x d x Dutch braille: I n t g open brace 1 slash 2 space p i dot dot p i close brace s i n x d x

Maybe the latter doesn't make sense. The MathTalk speech style (which I haven't yet implemented) is basically of the later form: each word corresponds to a Nemeth braille character. It's not the most natural style of speech. For example "x superscript 2 baseline plus 1". It is supposedly useful for people who know Nemeth.

dkager commented 8 months ago

Thanks for the elaborate reply, lots of interesting info in there! Let me clarify the MathJAX comment. Dedicon mainly delivers two digital formats:

  1. EPUB through VitalSource Bookshelf, which incorporates MathJAX. I was thinking that it would make sense to have MathJAX be able to output Dutch math (commonly referred to as linear notation) through the accessibility extension, but looking more closely at it now that may not be true. I'm reconsidering and thinking that it would be better/easier to send MathML through Bookshelf to the screen reader, which in turn uses MathCAT.
  2. Digital plain text file formats. In this case Dedicon needs a way to produce the linear notation during automated XML --> text conversion. While this is not quite 'math capable assistive technology', I think that MathCAT will be useful in this scenario (especially if it gains a Java API).
  3. Bonus: there may eventually also be digital talking books complemented by math using text-to-speech. This would require speech rules.

So to summarize, if there is no silent rule that braille rule files should only contain Unicode braille, then I am confident that we can make Dutch math work. You are correct that we should also incorporate proper speech rules, but in contrast to Sweden and Finland I think that that will be secondary for Dedicon.

For some background on the linear notation, my colleague Dorine and I wrote an article about it for DEIMS in 2016.

NSoiffer commented 8 months ago

Thanks for the clarification.

MathCAT does have a Java interface. I believe it is being used with BrailleBlaster.

MathCAT is pretty efficient: about 3-4 ms for generating either speech or braille for a medium-sized MathML expression on my 6 year old core I7 machine. I don't think dealing with math will be much of a problem time-wise if you incorporate it into your production system.

I believe I read your paper around the time of the conference (it seems very familiar). But it wasn't in my working memory.

danghoaiphuc commented 8 months ago

I am not sure if our software Sao Mai Braille would somehow address your requests. It's as a free Word editor and Braille translation. We use MathCAT for speech and Braille output for Math equations. It supports to output in both unicode and ascii Braille. For ascii Braille, it has several different optionss like German, UK, Euro Braille etc. For math notation, it works rather like you input equations in MS Word. Hope it helps.

NSoiffer commented 7 months ago

@dkager: Is this still something you want added to MathCAT or should the issue be closed?

dkager commented 7 months ago

Yes, I still want this to be added to MathCAT. I'm just not sure about who and when. Maybe Dedicon, as the author of the notation can help with this. But that is tentative for now.

NSoiffer commented 7 months ago

If there's a spec, I can do the implementation, but I need someone to write tests (typically based on the examples in the spec).

dkager commented 7 months ago

The spec is currently rather informal. We will be working on that in the coming weeks. I will get back to you when we have something more formalized.

NSoiffer commented 5 months ago

FYI: @dkager: the 0.5.0 build includes ASCIIMath output as a braille format. It makes use of the current output braille table, so it will work with either Dutch 6 dot or 8 dot braille. The user needs to set the braille table. It also supports LaTeX output (using the macro names and spacing "specified" by German LaTeX braille) in the same manner.

dkager commented 5 months ago

Thanks! I may go and convert/extend this to our Dutch notation at some point. Did you hear anything from Vispero regarding this yet?

NSoiffer commented 5 months ago

I'm not sure what you think I will hear from Vispero. I assume that when they pick up the latest version of MathCAT, they will include the new braille codes, but I don't know what their plans are. I haven't played around with MathCAT in JAWS yet and am uncertain about what features they have surfaced. If you have a specific question/ask, I can ask my contact there.

FYI: if you convert/extend the ASCIIMath implementation, you can use "include:" and then just modify the whatever rules you care about. There is a little spacing cleanup in the Rust code, so you probably want to talk with me about spacing cleanups.

For both LaTeX and ASCIIMath, I've been considering a Spacing option with values Loose, Medium, and Tight. I would say that for my implementation of ASCIIMath, it tended toward medium loose. For example, you could have hat x + bar y, hat x+bar y, or hatx+bary as loose/medium/tight output. I think MathCAT is currently doing the middle one (I didn't test it out).

dkager commented 5 months ago

To clarify, I believe Vispero is intending to add Dedicon's notation, seeing that that is what is being used in The Netherlands.

NSoiffer commented 5 months ago

Thanks for the clarification. By "Dedicon's notation", do you mean its math notation or literary braille notation? If they are doing the math notation, I'll see if it is something I can port into MathCAT. Hopefully they won't be too protective of their work given they get all my MathCAT work for free.

dkager commented 5 months ago

I mean the ASCII-like notation as discussed in this GitHub issue and listed on wiskunde.dedicon.nl. The Netherlands has no true braille math notation.

NSoiffer commented 5 months ago

So, for example: ⊇ should turn into "omvatOf="?

I haven't heard, but haven't asked, whether Vispero is working on Dedicon's notation.

If you could gather all the symbols and their ASCII equivalents into a single file with lines like (symbol, ascii_text) along with telling me what grouping symbols to use when a numerator or superscript isn't simple (parens, braces, ...), I could add it to MathCAT relatively easily. I'd need you to send me examples of MathML in, your notation out so I can know whether the code works as you intend. For example what happens with an arrow over a letter (a typical vector notation). In ASCIIMath, it is vec x. Does Dedicon's notation include things like this? If so, where do I find the info for that. If not, what should happen if MathML has that notation in it?

If Dedicon's notation is ASCIIMath with just a change in the name of the symbols (e.g. "omvatOf=" instead of "supe"), that's just a few hours to implement if you give me the symbol list.

dkager commented 5 months ago

This is what we discussed a little bit when I first opened the issue. Unfortunately Dedicon cannot prioritize on the implementation in MathCAT right now. That includes me. :-) That is why I am very eager to learn what Vispero (or any other open source contributor!) will make of it. In good news, I will be working on the somewhat more formal specification of the notation soon. That's something I can then share with you.