gerby-project / plastex

Python package to convert LaTeX markup to DOM
Other
14 stars 12 forks source link

Unhandled mathematics in 0BM6: handle accents inside text mode inside math #4

Closed pbelmans closed 7 years ago

pbelmans commented 7 years ago

There is unhandled mathematics in tag 0BM6: accents in mathematics might be wreaking some havoc.

pbelmans commented 7 years ago

As explained here the problem is that MathJax doesn't parse accents inside \text-like environments, because it's not math mode (and in theory, lots of things could indeed go there).

So we need to apply a TeX-to-Unicode (or HTML escape characters) conversion inside our renderer to fix this, as accented (or escaped) letters em do work inside MathJax.

pbelmans commented 7 years ago

Turns out that we cannot apply a conversion, because we just dump the source code of an equation to let MathJax handle it.

I'm afraid we'll have to choose the least ugly solution out of

  1. a Jinja2 function that tries to replace the most common accents

  2. whenever importing the plasTeX output in the database, try to replace the most common accents

  3. implement accent handling in MathJax text mode: this is what most people would benefit from, but it'd be as ugly as the previous pseudo-solutions

  4. run through the entire tree, whenever there is text mode inside math mode, flatten the tree by applying the render method to the text mode and replacing the subtree

Please suggest something if you have a better idea. The fourth method seems best for now.

chngr commented 7 years ago

Here is a potential solution: we could simply change the source property of the Accent commands. That is, we could simply add

class Accent(Command):

    ...

    @property
    def source(self):
        return type(self).chars.get(self.textContent.strip(), None)
    ...

into the beginning of the Accent class so that the character returned whenever you try to access the source of an accent command is the unicode character corresponding to the accented character.

I think this is not so bad: after all, if I put an accent on a character in a LaTeX document, I secretly just want the character to be the accented character, except that maybe I can not type unicode.

pbelmans commented 7 years ago

I don't see why this change would be a bad idea, but I also don't want to impose this upstream.

I checked it, and the mathematics in 0BM6 is now good.