PreTeXtBook / pretext

PreTeXt: an authoring and publishing system for scholarly documents
https://pretextbook.org
Other
254 stars 203 forks source link

Write mathematics as HTML script tags? #611

Closed rbeezer closed 2 years ago

rbeezer commented 6 years ago

With paragraph bust-up in place at #515 we could consider writing math/LaTeX/mathjax as script elements:

http://docs.mathjax.org/en/latest/advanced/model.html#mathjax-script-tags

This might greatly simplify cross-references to equations (HTML id rather than MathJax LaTeX \label{} mechanism). Maybe it would remove step from page loading and give a speed-up? Identifying mathematics with proper elements might have other benefits?

Not sure I see a downside, other than some uglieness to accomodate IE quirks. Benefits may be marginal.

Reactions?

davidfarmer commented 6 years ago

My reaction is that it is crazy to consider doing this in the next year. Maybe maybe it speeds things up (but there are other things under discussion that could improve the page rendering speed), at the cost of abandoning the simplicity and naturalness of letting MathJax work as it was originally intended.

I also do not understand the point being made by \label{}. This was made worse by my inability to find an example of a reference to a numbered equation in the sample article.

On Mon, 3 Jul 2017, Rob Beezer wrote:

With paragraph bust-up in place at #515 we could consider writing math/LaTeX/mathjax as script elements:

http://docs.mathjax.org/en/latest/advanced/model.html#mathjax-script-tags

This might greatly simplify cross-references to equations (HTML id rather than MathJax LaTeX \label{} mechanism). Maybe it would remove step from page loading and give a speed-up? Identifying mathematics with proper elements might have other benefits?

Not sure I see a downside, other than some uglieness to accomodate IE quirks. Benefits may be marginal.

Reactions?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.[AAM6LMb5c3TPc_waqajFjwzG6n2RjhIJks5sKVylgaJpZM4OMsmP.gif]

davidfarmer commented 6 years ago

Okay, I found an example of a referenced numbered equation. It looks like there is a \label in the HTML source, but it is ignored because the HTML has a hard-coded \tag, and the reference also has that tag hard-coded.

Since there is no way to replicate the numbering in the PDF unless the numbers are directly written into the HTML, I don't get what the \label is doing, or how an HTML id could be of any use.

rbeezer commented 6 years ago

On 07/05/2017 06:45 PM, davidfarmer wrote:

My reaction is that it is crazy to consider doing this in the next year. Maybe maybe it speeds things up (but there are other things under discussion that could improve the page rendering speed), at the cost of abandoning the simplicity and naturalness of letting MathJax work as it was originally intended.

My understanding is that the first thing MathJax does is cruise the page and translate $, (, etc into these script tags. So we would just be saving a step. No idea how much savings that would be. The page would be easier for others to parse, but I don't know why somebody would do that. ;-)

I also do not understand the point being made by \label{}. This was made worse by my inability to find an example of a reference to a numbered equation in the sample article.

I think you are right. Now that we open a knowl for a cross-reference to a displayed equation, the point is moot. In th very early days, it was a real fiddle to hyperlink to a displayed equation when it was identified by a "\label". See the MathJax config at the top of a page with bits like "useLabelIDs". (It hooks up PTX ids with MathJax-generated ids manufactured from labels. IIRC, which may be in doubt.)

This will be necessary for HTML output where there are not knowls, so not for naught. But we could ban it from straight HTML output. Nice catch. ;-)

https://github.com/rbeezer/mathbook/issues/612

Rob

rbeezer commented 6 years ago

Right. ;-) Very good.

On 07/05/2017 06:55 PM, davidfarmer wrote:

Okay, I found an example of a referenced numbered equation. It looks like there is a \label in the HTML source, but it is ignored because the HTML has a hard-coded \tag, and the reference also has that tag hard-coded.

Since there is no way to replicate the numbering in the PDF unless the numbers are directly written into the HTML, I don't get what the \label is doing, or how an HTML id could be of any use.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rbeezer/mathbook/issues/611#issuecomment-313273915, or mute the thread https://github.com/notifications/unsubscribe-auth/ABy2cjav8I0-xlODD8eP4cqusBZ9UuWwks5sLD6ZgaJpZM4OMsmP.

rbeezer commented 2 years ago

My reaction is that it is crazy to consider doing this in the next year.

It has been four years now.

Alex-Jordan commented 2 years ago

In the CAT example I saw earlier today, the page used MathJax 2.

My weak understanding is that MJ2 finds all the \(...\) and turns them in the DOM into <script type="math/tex"> tags. My total speculation is that the CAT is looking for these script tags to understand where math starts and stops.

My equally weak understanding of MathJax3 is that it does not work this way. It does not build <script type="math/tex"> tags from \(...\). If all of the above is right, the CAT may have an issue with MJ3.

However, MJ3 can be configured to look for <script type="math/tex"> tags. (Either in addition to \(...\), or instead of.) If PreTeXt HTML exclusively used <script type="math/tex"> tags for math, it may be a good thing for the CAT. That is, if my weak understandings and assumptions are anywhere close to correct.

A bonus effect of using <script type="math/tex"> tags is that you can directly write < characters inside them without the web browser thinking that a new tag is starting. So from the perspective of an author using the CAT who is oblivious to the need for \lt, it's another reason to use <script type="math/tex"> tags. (Although for translation to PTX, you would still need to turn < into \lt. Or at least into &lt;.)

davidfarmer commented 2 years ago

The CAT does not use the typeset inline math. For display math the HTML has a wrapper which is used. So, this issue is not relevant to the CAT.

Also, the CAT is fine with "<" in the input source, because that is intercepted and converted. (Is, or will be in a few days).

I am not seeing the benefit of this idea.

rbeezer commented 2 years ago

I am not seeing the benefit of this idea.

100% unambiguous. Math/LaTeX delimited by HTML/XHTML/XML syntax. Not a convenient syntax for people authoring one-off web pages outside of PreTeXt.

Suppose somebody "accidentally" authors \(foo\) inside a paragraph. It'll get rendered as LaTeX in HTML output. Except if you try the experiment, it won't happen as I just said. Because we scan all text nodes and any instance of \( becomes \[unicode-no-width-space-here](. We could forgo that kludge.

Alex is getting this script version back from WW servers. I had to put this in bare HTML so an author's interactive could have our MJ run over some LateX in a Javascript widget showing slopes (which is broken now with MJ3, but I know how to fix).

Alex-Jordan commented 2 years ago

Well, I was wrong about the CAT then.

I think it's a good thing if we really don't care how human readable the HTML is. Rob assures me we do not. But maybe you feel differently?

Next consideration is how bad do we want to prevent adventurous hackers from doing something off book? Like I just tried this:

            <p>
                A hack: <m>x+y\) equals \(z</m>
            </p>

And the result looks fine, like:

Screen Shot 2021-05-21 at 9 49 17 PM

Moving to script tags would partly kill such attempts, since to do the same thing you would use closing/opening span tags, which then would ruin your PDF output.

Alex-Jordan commented 2 years ago

Actually that hack is worse than I thought. At present we can do:


            <p>
                A hack: <m>\) Anything can go here, and no tomfoolery will be prevented,
                because we allow text nodes inside math to pass through unaltered.   \(</m>
            </p>
davidfarmer commented 2 years ago

If the only change to inline math is replacing \( and \) by opening and closing script tags, then that is fine with me and I don't see any headaches for CSS or the CAT.

The MathJax setup for the page should change so that \( is not interpreted as as opening math tag. Would that eliminate the need to do \[unicode-no-width-space-here]( ?

Will someone be allowed to define \( as a macro for \left(?

Are there any changes to display math? If div.displaymath is still the wrapper, and any changes are inside that div, that is okay with me.

Alex-Jordan commented 2 years ago

Would that eliminate the need to do [unicode-no-width-space-here]( ?

Yes.

Will someone be allowed to define ( as a macro for \left(?

No. Print latex would still regnize and use \( for its math opening delimiter.

Are there any changes to display math?

It would also replace \[ with the same script tag, with another attribute mode=display.

davidfarmer commented 2 years ago

How about PreTeXt never outputs backslash square bracket, instead begin(equation*} or whatever is the equivalent?

On Sat, 22 May 2021, Alex Jordan wrote:

  Would that eliminate the need to do [unicode-no-width-space-here]( ?

Yes.

  Will someone be allowed to define ( as a macro for \left(?

No. Print latex would still regnize and use ( for its math opening delimiter.

  Are there any changes to display math?

It would also replace [ with the same script tag, with another attribute mode=display.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, orunsubscribe.[AABTULEHAWBE7FDJCO7WP2TTO7JI7A5CNFSM4DRSZGH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVX HJKTDN5WW2ZLOORPWSZGOGJZXZMI.gif]

rbeezer commented 2 years ago

never outputs backslash square bracket, instead begin(equation*}

I was about to say that I think that is the way it is now!

@Alex-Jordan: If MJ is configured to only look for the script element, will it still "see" \begin{equation}, \begin{align}, etc?

Alex-Jordan commented 2 years ago

With MathJax, it will process \begin{xxx}...\end{xxx} whether you are in math mode or not. For example, it will process \begin{equation*}...\end{equation*} whether you are inside math mode or not. It will process \begin{matrix}...\end{matrix} whether you are inside math mode or not.

For this, you would put the script tag with mode="display" around the \begin{equation*}...\end{equation*}.

davidfarmer commented 2 years ago

For inline math, this is actually a good change which helps the CAT.

As much as possible, I prefer data over code. What I mean is that data describes an object, and general-purpose code converts the data into the chosen representation of that object. Need a new object? Just describe its data, with no need to change the code.

Almost everything is xml, so its opening and closing tags are described by a tagName, attributes, and attribute values. But not math. Inline math is not xml, so now there is no tagName, but you need two new fields instead: openingTag and closingTag. The code has to check if there is a tagName, and if not, then use the opening and closing tags.

Switching to <script type="math/tex"> addresses that special case for inline math. That is good, and I will switch to that as the inline HTML math wrapper.

But what about display math? If I still need to supply \begin{equation*}...\end{equation*}, or align for multiline, then I can't remove the code which handles the special case of no xml tagName.

Alex-Jordan commented 2 years ago

If you have

<script type="math/tex" mode="display">\begin{equation*}\frac{x}{2}+y=z\end{equation*}</script>

and you drop the script tag and have \begin{equation*}\frac{x}{2}+y=z\end{equation*} stored in some variable, you still have valid MathJax inline math. It would be invalid in regular LaTeX, but \begin{equation*}\frac{x}{2}+y=z\end{equation*} is valid in inline math mode when using MathJax. So perhaps viewing the \begin{equation*}...\end{equation*} as a display math delimiter is not how to look at it. Instead there is a thing that MathJax does where it looks for these delimiters, and if there are no other containing math delimiters then it infers you want display mode.

rbeezer commented 2 years ago

Right. Except PTX HTML output had a div.displaymath wrapping it.

Not sure I'm tracking the CAT scenario, but div.displaymath could certainly contain more information via additional attributes - like the originating PTX element (md, mdn, me, men) or the resulting LaTeX environment (equation, align*, gather). Would that help?

davidfarmer commented 2 years ago

The user of the CAT only types the contents of the display math, not the begin and end tags.

Wrapping the content in div.displaymath and/or a script tag is the expected behavior.
It is the LaTeX-style begin and end tags I would like to avoid. But if those have to be there, then it is not actually that much of a hassle for me to leave the code as-is. And I would not be surprised if down the road other things need separate beginning and ending tags.

rbeezer commented 2 years ago

Maybe way off-base, but is the following a solution? Maybe not a good solution, but a demonstration that I am understanding.

<div class="displaymath" latex-env="equation">
x^2+y^2=25
</div>

and then PTX JS gets this before MathJax does and injects the script tag for MJ, and the \begin{equation} to make the right LaTeX, then MJ gets what it needs/wants?

Alex-Jordan commented 2 years ago

Will the CAT be delving into mrow level markup? In other words, will CAT users write \\, or will they use the CAT to move to a new row?

davidfarmer commented 2 years ago

Will the CAT be delving into mrow level markup? In other words, will CAT users write \, or will they use the CAT to move to a new row?

I am still thinking about multiline, and there are two specific ideas I am currently considering:

a) leave a blank line to separate the mrow s

b) Something like what I wrote for Space Math:

https://aimath.org/~farmer/spacemath/#aligned

In either case, the alignment is automatic unless the author inserts an ampersand.

Alex-Jordan commented 2 years ago

Here are the MathJax instructions for using script tags in MJ3: http://docs.mathjax.org/en/latest/upgrading/v2.html?highlight=findScript#changes-in-the-mathjax-api#math-script-example

As I read the configuration code, it seems that it is not important to literally use mode='display'. That attribute in the script tag could be mode='gather', mode='align', or mode='alignat' to keep track of the type, and in that configuration the variable display just needs to look for any of them. Or mode='display' if it is not multiline. Maybe starred variants to track numbered or not.

So a rough outline is: based on the value of @mode in the script tag, the CAT should know to expect \begin{gather}...\end{gather}, \begin{align}...\end{align}, \begin{equation}...\end{equation}, etc., and trim away exactly what it expects to find there, while keeping track of the flavor based on the explicit @mode value.

In the other direction, it could take what an author writes and infer:

based on what the user typed in. The only thing I can't think of an automated inference for is alignat versus align. But if the CAT infers (for example) multiline gather, not numbered, it could build the script tag with @mode set accordingly. It could write PTX output that only had the real row by row content that the user typed. And it could insert the \begin{gather*}...\end{gather*} inside the script tag for HTML.

If I understood right, the redundancy of setting @mode and also writing \begin{gather*}...\end{gather*} is less than ideal.

davidfarmer commented 2 years ago

I looked at the way MJ3 replaces find and standard math delimiters by a script tag. It is not as simple as in MJ2.

If we decide to make the switch, it should be done when we have time to focus on it.

rbeezer commented 2 years ago

Not imminent. ;-)

rbeezer commented 2 years ago

MathJax 3 allows for marking elements it will ignore, and marking elements it will process. There still needs to be LaTeX delimiters, but since we know whare math is, we can isolate the intrepration of these delimiters. So a different solution to the proposal here.

Guts of the change at d5ef68ffa5ab5262a4a4f07258596ccb9f2f9790