fred-wang / TeXZilla

LALR Javascript LaTeX-to-MathML converter compatible with Unicode
http://fred-wang.github.io/TeXZilla/
130 stars 21 forks source link

prime and superscript on same character enabled #59

Open 70ray opened 5 years ago

70ray commented 5 years ago

A new symbol 'superScript' defined and used in compoundTerm new unit tests added for this

fred-wang commented 5 years ago

Thanks for the PR. Have you checked consistency with itex2MML? And whether all the unit tests pass?

In any case, the MathML output seems incorrect

a'^b should be semantically equivalent to {a'}^b i.e.

<msup>
  <msup>
    <mi>a</mi>
    <mo>′</mo>
  </msup>
  <mi>b</mi>
</msup>

but your unit tests make it equivalent to a^{'b}. Probably, the correct MathML output if you want to put all the scripts on the same line should be

<mmultiscripts>
  <mi>a</mi>
  <none/>
  <mo>′</mi>
  <none/>
  <mi>b</mi>
</mmultiscripts>

cc @davidcarlisle @distler

distler commented 5 years ago

Arguably, itex2MML does it wrong (or, at least, differently): a' is rendered as

<mi>a</mi><mo>′</mo>

(that is, the "′" is treated as an accent/modifier, rather than as a superscript). And (alas), you do need to write {a'}^b if you want to get

<msup>
  <mrow><mi>a</mi><mo>′</mo></mrow>
  <mi>b</mi>
</msup>
distler commented 5 years ago

(And, of course, TeXZilla currently follows itex2MML in this regard.)

70ray commented 5 years ago

I had found that MathJax and MiKTex both handle a'^b. But arguably {a'}^b as distler suggested would be a better way of writing it so I am inclined to withdraw the merge request.

70ray commented 5 years ago

On second thoughts ... Should TeXZilla handle a'^b at all? I haven't been able to find any definition of what is or is not valid TeX input.

Regarding how a' is rendered: in fact TeXzilla does <msup><mi>a</mi><mo>′</mo></msup>

MathJax renders a'^b as <msup><mi>a</mi><mrow><mo class="MJX-variant">′</mo><mi>b</mi></mrow></msup>

I agree it should be semantically equivalent to {a'}^b but the presentation is much the same. Another case is a_b' which is semantically {a_b}' which renders with the prime to the right. While this is in a sense "more correct" it is not how it is usually typeset. It's more complicated than I thought at first.

distler commented 5 years ago

It's more complicated than I thought at first.

Welcome to my world.

Trying to infer correct semantics from LaTeX input is a fiendishly difficult task. And it's not just primes... Consider (a+b)^2 . That should be equivalent to {(a+b)}^2, not (a+b{)}^2, but good luck finding a parser (itex2MML, TexZilla, MathJax, ...) that does that correctly.

Since (La)TeX doesn't enforce any sort of semantics on its input (in this case, parentheses don't have to be balanced), there is, except in rare circumstances, no way to infer correct semantics from (La)TeX input. If you care about such things (e.g., if you hope that a screen reader might read your MathML correctly), then you should probably start typing in the braces yourself -- {(a+b)}^2 -- rather than hoping that TeXZilla (or itex2MML or MathJax) infers them correctly for you.