Figure out the best way to do math for speed, accessibility, and interoperability

For displaying math in the arXiv project we need to balance the needs of 3 users:

People reading normally in a web browser
People reading with a screen reader
Machines reading the page for whatever reason

Two broad approaches, I think:

Rendering the math server-side using MathJax and putting MathJax's output in the HTML. This is the current approach on arxiv-vanity.com. Advantage is it is super fast, so best option for able users reading in a web browser. Disadvantage is that it is not good for screen readers without MathJax's accessibility extensions (I think) and is a proprietary format.
Putting MathML in the HTML, and rendering MathJax client-side (either directly from the MathML or with a TeX annotation). This is best for interoperability and accessibility (I think?). Downside is it is really slow.

I think there might be a third approach, which is a mix of the two. Perhaps we could include both the MathML and pre-rendered MathJax in the HTML, doing some cleverness client-side to display the correct thing depending on how the user is reading it. (e.g. display: none the MathML so it is just for machines).

@dginev, @kohlhase, @pkra - thoughts / opinions?

Background reading

https://github.com/mathjax/MathJax/issues/938
some details on current state of mathml: https://github.com/w3ctag/design-reviews/issues/313
https://www.bersling.com/2016/05/10/displaying-math-on-the-web/

Let me preface with saying that the good news is we can change between these options with relative ease, and nothing has to remain set in stone for decades to come. There may be upcoming swings in MathML support (one way or another, be it Chrome adding support or the next HTML dropping it), so this discussion likely won't mark a final outcome as far as arXiv is concerned.

As another tech preliminary: There is a lot of advanced latex used in arxiv's equations (as one would expect), so I'm assuming the implicit basis for Ben's approaches is a run with latexml that spits out either the fully expanded TeX source or latexml's idea of a MathML tree (for which there is a performance penalty, but you only need to run it once server-side).

Next, you get the usual server-side vs client-side trade-offs, complicated by the politics of the MathML situation. We're bound to disagree on a "best practice" answer here, given all the turmoil.

Some thoughts:

There has been a long-standing regret for a shortage of Content MathML out in the wild, to bootstrap a development effort. And CMML was made with the original promise of aiding both accessibility and machine-readability for an open-ended set of applications. I've returned to working with @brucemiller on improving the latexml-generated Content MathML dialect this year, and we're making good headway with formalizing more of the DLMF equations. Most of these improvements also translate into better CMML for arXiv. More importantly, latexml has been run against the entirety of arxiv on an annual basis, with cross-referenced presentation and content MathML generated. We've always hoped to see those formula results published live "officially", at least as an experiment. Given the importance of arXiv, it may be a helpful push for the community -- I'm sure at least the "Math search engine" teams will be delighted.
On the "client-side" front, if you use Firefox today, you can preview an example arxiv article with MathML without having to ever run mathjax, and the rendering tends to be blazing fast. If you run any of the other major browsers, you'll need to wait for your polyfill of choice to render the page, certainly. If Chrome moves forward and re-adds a MathML rendering capacity, 2019 may be the year of fast client-side MathML.

Back to 2018, I've seen various takes - in fact too wide a variety of takes - to use two parallel representations, one for rendering performance, the other for accessibility. I've even seen that trade-off made with using mathjax vs katex in different page contexts of the same site. It's somewhat easy to make the client-side experience glitchy, or make the accessible representation so custom that you're forced to write custom code for the arxiv pages. The "sane" solutions I've seen seem to fall into:
- Prepare multiple server-side versions of the article and serve the preferred one to each user (either by browser detection, or better - as a user preference for the site)
- KISS and sacrifice the less relevant dimension to some extent
- of the custom approaches, I don't really love any, but my least disliked is usually serving the performance-optimized page, and adding custom markup to each equation that links to a remotely hosted accessible version, much akin to a src attribute of an img tag. It's an annoying automated workflow, but keeps the common usage pattern (which is currently the vast majority of visits?) simple and efficient.

Since I have skin in the game, I would definitely prefer picking an approach that makes serving Content MathML easy, if not included by default. Will leave it at this for now.

This is very useful, thank you. As you say, this is a moving target, but it would be nice if we can aim as well as we can at its current position. :)

As long as there are tools that can produce it, I see no disadvantage to including MathML of some kind, so I am on board with that.

A few additional questions:

Is there any tradeoff between Content vs Presentational MathML I should be aware of? Is there stuff that can't be represented in either? Less support for rendering? I have tried Googling but come up with no practical information.
Have you got any examples of your least disliked option? This mathjax ticket has some examples of how to include mathml for accessibility/machine readability which seem like a decent approach.

I wonder if we could find some blind physicists/mathemeticians to talk to. I'm sure they have strong opinions about the best way to render math for screen readers. ;)

As to 1), you can think of Content MathML as completely separate from the page rendering, it's an add-on to the presentation facet (for which you pay in page size and conversion time), and is only usable for auxiliary applications - such as screen readers, math search, and (in theory) interoperating with advanced math applications. In practice you'd see it tucked in the main math element as an annotation. Here is a snippet with the (cross-referenced) annotations latexml can give you today on an example formula, including a CMML tree.

<math id="p1.1.m1.1" class="ltx_Math" alttext="1+1=2" display="inline">
    <semantics id="p1.1.m1.1a">
        <mrow id="p1.1.m1.1.6" xref="p1.1.m1.1.6.cmml">
            <mrow id="p1.1.m1.1.6.1" xref="p1.1.m1.1.6.1.cmml">
                <mn id="p1.1.m1.1.1" xref="p1.1.m1.1.1.cmml">1</mn>
                <mo id="p1.1.m1.1.2" xref="p1.1.m1.1.2.cmml">+</mo>
                <mn id="p1.1.m1.1.3" xref="p1.1.m1.1.3.cmml">1</mn>
            </mrow>
            <mo id="p1.1.m1.1.4" xref="p1.1.m1.1.4.cmml">=</mo>
            <mn id="p1.1.m1.1.5" xref="p1.1.m1.1.5.cmml">2</mn>
        </mrow>
        <annotation-xml encoding="MathML-Content" id="p1.1.m1.1b">
            <apply id="p1.1.m1.1.6.cmml" xref="p1.1.m1.1.6">
                <eq id="p1.1.m1.1.4.cmml" xref="p1.1.m1.1.4"></eq>
                <apply id="p1.1.m1.1.6.1.cmml" xref="p1.1.m1.1.6.1">
                    <plus id="p1.1.m1.1.2.cmml" xref="p1.1.m1.1.2"></plus>
                    <cn type="integer" id="p1.1.m1.1.1.cmml" xref="p1.1.m1.1.1">1</cn>
                    <cn type="integer" id="p1.1.m1.1.3.cmml" xref="p1.1.m1.1.3">1</cn>
                </apply>
                <cn type="integer" id="p1.1.m1.1.5.cmml" xref="p1.1.m1.1.5">2</cn>
            </apply>
        </annotation-xml>
        <annotation encoding="application/x-tex" id="p1.1.m1.1c">1+1=2</annotation>
        <annotation encoding="application/x-llamapun" id="p1.1.m1.1d">NUMBER:1 ADDOP:plus NUMBER:1 RELOP:equals NUMBER:2</annotation>
    </semantics>
</math>

As to 2), each of these annotations could also be hosted as separate resources outside of the main page, which is sadly non-standard, but keeps the simplicity and performance of the mainstream use pattern. The best example I have for such a setup comes from the DLMF. If you take an arbitrary equation and click on the circled i icon on the right-hand side, you will see an "Encodings" heading with different versions of the equation. That's my least disliked option for performance, but it isn't fully fleshed out yet for accessibility. There needs to be some machine-readable annotation to connect the remote resources etc.

I think the MathJax approach is quite reasonable, and they've had rather cool results. It's more pragmatic than "waiting for Content MathML" as they try to make the presentation accessible already, which is available at all times. But to use this approach you'd need to serve the presentation MathML in the main page, correct? And then you need to figure out how to serve a pre-rendered HTML version of the equation (that's the part that could end up glitchy, but maybe it's easier than I suspect) or decide on doing client-side rendering with mathjax of all formulas, which could be slowish. I'd need to try my hand at it or see a demo to understand it better.

Also definitely a great suggestion to get mathematicians from the accessibility community on board early on, and make sure we're building usable solutions for them, @kohlhase may have a good pointer in that direction.

arxiv-vanity / engrafo

Figure out the best way to do math for speed, accessibility, and interoperability #519

Background reading