KaTeX / KaTeX

Fast math typesetting for the web.
https://katex.org
MIT License
18.34k stars 1.18k forks source link

Katex does not parse but MathJax does #1676

Closed CurtisHumphrey closed 5 years ago

CurtisHumphrey commented 6 years ago

So we have an equation with _ (underscores) in the name like:

\text{Score} = 2\times\text{(Protein_(g))}  -  0.75\times\text{(Lipid_Tot_(g))}

And in mathjax it renders it just fine (url%7D%20%20-%20%200.75%5Ctimes%5Ctext%7B(LipidTot(g))%7D)). However as seen if one uses https://khan.github.io/KaTeX/, Katex yields this error:

KaTeX parse error: Expected '}', got '_' at position 37: …s\text{(Protein_̲(g))} - 0.75\…

Do we know why, and if so should we fix Katex or is mathjax wrong?

edemaine commented 6 years ago

quicklatex.com confirms that LaTeX behaves the same as KaTeX here, which is generally KaTeX's goal. The correct way to write that example (from a LaTeX perspective) is

\text{Score} = 2\times\text{(Protein\_(g))}  -  0.75\times\text{(Lipid\_Tot\_(g))}

Basically, _ is a special character in LaTeX that is allowed only in math mode, for producing subscripts. In \verb, it will be treated as an underscore character, but in \text, it needs to be escaped as \_.

I could see an argument for allowing bare _ in text mode in KaTeX's nonstrict mode, but hopefully it's easy enough to change the source code.

CurtisHumphrey commented 6 years ago

@edemaine yea, I think Katex is following pure latex well. However, so many other systems use Mathjax that we have an issue where they will not render the same. If I use the \_ inside the \text{} block in Mathjax both chars are displayed and not just _. So maybe in \text fewer processing rules are needed?

edemaine commented 6 years ago

Wow, that's a pretty bad implementation of \text in MathJax. It seems that whatever is passed in is treated verbatim; no macros work inside, although embedded $...$ math expressions work. I also tested on https://www.mathjax.org/#demo

If the font difference isn't a problem, you can do the following on both MathJax and KaTeX:

\verb|Score| = 2\times\verb|(Protein_(g))|  -  0.75\times\verb|(Lipid_Tot_(g))|

Do you have a situation where you need to support both MathJax and KaTeX? It seems like this is mainly a bug with MathJax, though again KaTeX could consider supporting raw _ in nonstrict text mode.

CurtisHumphrey commented 6 years ago

Yes, I do. I have users who use jupyter notebooks to author latex equations (that uses Mathjax in preview mode) that we then render the latex in an html page using Katex. So I was trying to get both to match. The font isn't an issue in our case. Good find with \verb! I had not thought of that. At the moment I do a preparsing to replace _ with \_ in the \text{} blocks only before I give it to katex and that seem to do the trick.

Example code in case it helps others:

const example = '\\text{Score} = 2\\times\\text{(Protein_(g))}  -  0.75\\times\\text{(Lipid_Tot_(g))}'

const regex_text_group = /\\text{([^}]*)}/g
const fix_underscore = (match) => match.replace(/\_/g, '\\_')

const fixed_underscores = example.replace(regex_text_group, fix_underscore)
katex.renderToString(fixed_underscores, {displayMode: true})
kevinbarabash commented 6 years ago

The unfortunate thing about some of the MathJax specific behaviours is that we can't just pre-process the MathJax variant and we can't just post-process the parse tree. :( I'm kind of surprised that MathJax is so non-compliant with LaTeX given the organizations that are supporting it.

kevinbarabash commented 6 years ago

~I feel like we should be opening bugs against MathJax for things that not compliant with LaTeX.~ It looks like opening issues against MathJax will not help this situation as this bug has already been filed and rejected, see https://github.com/mathjax/MathJax/issues/1770.

kevinbarabash commented 6 years ago

The rationale for this behaviour is:

Another source of difficulty is when MathJax is used in content management systems that have their own document processing commands that are interpreted before the HTML page is created. For example, many blogs and wikis use formats like Markdown to allow you to create the content of your pages. In Markdown, the underscore is used to indicate italics, and this usage will conflict with MathJax’s use of the underscore to indicate a subscript. Since Markdown is applied to the page first, it will convert your subscript markers into italics (inserting <i> or <em> tags into your mathematics, which will cause MathJax to ignore the math).

I'm still a little foggy on why _ is okay but \_ is not okay in this scenario.

edemaine commented 6 years ago

While we maybe can't change MathJax's definition of \text{_}, I think we could file a bug that macros like \_ or \textunderscore don't work in \text{...}. Also, for example, text accents (\text{\'e}), text font commands (\text{\emph{hello}}, \text{\textit{italic}}) won't work. Everything inside \text, except $, seems to be treated verbatim.

This is the first big feature I've noticed missing in MathJax -- a good argument for using KaTeX.

CurtisHumphrey commented 6 years ago

Yes @kevinbarabash I was surprised too that Mathjax wasn't compliant here and that they had rejected the request to make it compliant. Should we add a section to KaTex's readme about this difference between the two libraries and that KaTex is doing it correctly per spec?

pkra commented 6 years ago

A relevant issue would be https://github.com/mathjax/mathjax-v3/issues/135.

Before going all xkcd#386, you might consider that are valid reasons to avoid text mode (e.g., it is essentially a different layout system, also incompatible with CSS). Some are mentioned at https://docs.mathjax.org/en/latest/tex.html#differences.

If you are interested in thinking about the larger community, you might consider starting a discussion on the W3C MathOnWeb community group to work towards better compatibility across TeX-like conversion tools. (Disclaimer: I co-chair the CG.)

kevinbarabash commented 6 years ago

@pkra thanks linking to the relevant MathJax issue. I'm glad to hear that you're considering making this configurable in v3. In the interest of improving interoperability KaTeX could add the current MathJax behaviour and make it configurable as well. For KaTeX though, I think we'd probably want to make our current behaviour the default. The rationale being that there's probably more TeX code in the world that relies on the TeX behaviour than than the current MathJax behaviour.

pkra commented 6 years ago

I'm glad to hear that you're considering making this configurable in v3

Just to clarify: I'm merely a community member nowadays, so I can't predict what the MathJax team will decide.

dpvc commented 6 years ago

As has been pointed out, this has been an issue for MathJax users, and so I have capitulated and put together an extension to allow processing (some) text-mode macros in \text{} and other text-mode settings. Perhaps that will make things more compatible.

nikilarigela commented 6 years ago

in mathjax it renders fine, but in katex it shows ( KaTeX parse error: Expected 'EOF', got '[' at position 2: \̲[̲f\left( x \righ… )

 \[f\left( x \right)\, = \,\,\left\{ {\begin{array}{*{20}{c}}
  {\frac{{ - \sin \,\left\{ {\cos \,x} \right\}}}{{x - \pi /2}}}&{x\, \ne \,\frac{\pi }{2}} \\ 
  2&{x\, = \,\frac{\pi }{2}} 
\end{array}\,,} \right.\]

katex

edemaine commented 6 years ago

@nikilarigela This would be best put in a new issue. The problem you seem to be having is giving the \[... \] wrapper to KaTeX. You should be giving the insides to KaTeX, without the \[ and \].

P. S. Also, KaTeX's array environment doesn't support * yet. But that's another issue.

nikilarigela commented 6 years ago

Thanks @edemaine , I have used without the [ and ]. , but didn't worked. issue is because of the * , do we have any work around for that.

ylemkimon commented 5 years ago

Closing in favor of #1736.