brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

\lnot ¬ should be a prefix operator #2163

Closed rzach closed 1 year ago

rzach commented 1 year ago

$\lnot A$ will generate <mi mathvariant="normal">¬</mi><mo>⁢</mo><mi>A</mi> but the ¬ is a prefix operator not an identifier.

In general, is there a way to help LaTeXML generate the right MathML for prefix operators? E.g., if I want to take a relation (say, \sim) and turn it into a prefix operator (i.e., avoid the spacing around relations in LaTeX) I'd say \mathord{sim} but that turns it into an <mi>. My screen reader, e.g., will read out `$\mathord{sim}A$ as "tilde times upper A".

dginev commented 1 year ago

Interesting, it is defined as a FUNCTION role at the moment, as is \neg:

https://github.com/brucemiller/LaTeXML/blob/2dde370c97d94238e2f987e0ef5fe50acc67cb58/lib/LaTeXML/Package/TeX.pool.ltxml#L5912

We have a POSTFIX role for constructs such as the ! factorial, so maybe we need a very simple PREFIX role alongside for simple prefix symbols meant to become <mo> elements.

dginev commented 1 year ago

@rzach you can indicate a prefix operator using \mathop at the moment, although it uses the BIGOP role internally which assumes a bit too much (or at least the name is too suggestive for larger operators such as \forall, \exists).

Here's a quick test:

$ latexmlc --whatsin=math 'literal:\mathop{\lnot} a'
$ latexmlc --pmml --whatsin=math 'literal:\mathop{\lnot} a'

Edit: I should also disclose that there is work on switching out the mathematical grammar framework for one that is more capable of ambiguous parsing, in the process of which we also mean to improve the tokenization stage. This is currently part of a bigger project that migrates to a low-level programming language, so we are some ways away from getting all the pieces in place. Just to clarify that while small-to-medium upgrades to math parsing should already be possible in the current code, larger upgrades may need to wait until the switchover.

rzach commented 1 year ago

Thanks! (While we're on the subject of ¬ : Why does it have the mathvariant="normal" attribute? (It makes screen readers read it as "normal not sign" instead of "not sign" which gets tedious after a while--at least when run through MathJax.)

dginev commented 1 year ago

@rzach the mathvariant details are neither here nor there, since the emitted markup is incorrect, as we discussed above.

In general, the "normal" variant is now the only accepted mathvariant value in MathML Core, and is used to avoid the default "italic" treatment of <mi> elements. I believe latexml explicitly sets that for content which it isn't convinced should be italic.

So indeed, we don't need that attribute, but the root cause is that we are treating an operator the same way we would a function name.

brucemiller commented 1 year ago

Not convinced that the original markup is wrong, but could be better. Probably should have declared \not and \lnot as role='OPERATOR', rather than FUNCTION. I think that will lead to an mo rather than mi, and so the mathvariant would no longer be needed, even though that also is not wrong. Pronouncing it seems wrong though. Using \mathord gives you an "ordinary" symbol, so is not the way to mark up a prefix operator.

rzach commented 1 year ago

I see there are commands in https://github.com/brucemiller/LaTeXML/blob/master/lib/LaTeXML/texmf/latexml.sty that look like they should allow me to (re)define \lnot (and other symbols) in a way that tells LaTeXML what MathML stuff they should map to. But they're not documented there or in the manual afaict...

brucemiller commented 1 year ago

Not quite prepared to dive into documentation, ATM, but perhaps a fix will help? #2170; Thanks for the report!

rzach commented 1 year ago

Aside: BIGOP is better since it turns ¬ into an <mo>. BUT LaTeXML adds an explicit rspace="0.167em".

LaTeXML also has the PREFIX role (according to https://math.nist.gov/~BMiller/LaTeXML/manual/math/details/roles.html) which afaict isn't actually used anywhere. I figured out that I can get a plain <mo> without the extra space using \lxMathTweak{role={POSTFIX}}{\lnot}. (It did give me a

Warning:not_parsed:>POSTFIX MathParser failed to match rule 'Anything'...

Perhaps the clean way to do this is to actually let LaTeXML do something with the PREFIX role (e.g., turn it into <mo form="prefix">) and then make \lnot have that role. That's probably what @dginev meant up top in the first comment...

Anyway, thanks!