gerby-project / plastex

Python package to convert LaTeX markup to DOM
Other
14 stars 12 forks source link

Tags 0176 and 018I stop the rendering process #3

Closed pbelmans closed 7 years ago

pbelmans commented 7 years ago

These tags stop the rendering process (but it shuts down nicely nevertheless). I have no idea why.

chngr commented 7 years ago

Experimenting and looking at some output, the culprit for the 0176 and 018I errors may be a misparsing of \\ followed by the character [. For instance, in 0176, this pattern appears when passing to the newline within the align environment; in 018I, this appears in the second xymatrix in the proof when passing from the second row to the third row. In both cases, removing the [ character or else replacing it by \lbrack resolves the problem.

That said, I do not think there is an error in 0176 after inserting amsmath into the preamble (see #6) of the document. What this suggests is that the parser will get confused in environments it does not recognize---align was not a valid environment without amsmath, and xymatrix is not implemented in plastex as far as I could tell (this is probably also related to #5)---and identify \\[ as a open display math mode token and then stumble over itself when proceeding.

I will poke around with this hypothesis a bit more.

chngr commented 7 years ago

So poking around some more, the problem is indeed caused by the pattern \\[...] within the xymatrix. The issue is that plasTeX thinks \\[0], say, is meant to indicate the vertical skip for the new line and misparses it. Incidentally, this explains some of the strange warnings when it complains about units.

One fix, which is also related to #5, is to simply force plasTeX to keep everything within the \xymatrix command intact. It's not so hard just to parse until you get a matching { and } pair, but the simple way of doing so will not properly expand macros within an \xymatrix command. Perhaps that is fine for the time being...

pbelmans commented 7 years ago

Thanks for checking this!

We might be able to draw some inspiration from this pull request which implements TikZ support in plasTeX. In particular, the NoCharSubEnvironment might be useful for us, as we will have to write our own invoke method. I'm toying with this right now.

Edit: actually, maybe NoCharSubEnvironment isn't that useful for us. It's only about some substitutions, like -- becoming an en dash.

pbelmans commented 7 years ago

The fix for #5 was easy, but this will be much harder to get right in general I'm afraid.

I suggest that we just fix it in the Stacks project repository by using \lbrack for now, and see whether it comes up in other projects. If that is the case, then we spend energy on this.

chngr commented 7 years ago

Sounds good. Alternatively, you can add \relax after the newline characters \\ to stop the parser from interpreting [...] as a dimension specification.

aisejohan commented 7 years ago

Guys, doesn't it seem this is definitively an error of plasTeX. For example, suppose you put a space following '\\'. Does it still bork? LaTeX thinks of a line ending as the same thing as a space, right? So plasTeX should as well. Am I right?

chngr commented 7 years ago

It does still fail, or at least, not perform as expected, when you join the lines and put a space after \\. I think the difference is that when you are in the \xymatrix{...} command, LaTeX treats \\ differently than when you are simply in the displaymath environment. For instance, if you try to write 1 + 1 = 2 \\[10pt] in a vanilla displaymath environment, LaTeX will add 10pt worth of vertical space under the displayed equation. But if you try to do this in the \xymatrix command, \\[10pt] is simply a new line followed by the output [10pt] in your PDF.

So, in a sense, yes it is a fault of plasTeX. But it is in the sense that the \xymatrix command is foreign to plasTeX. So the complete solution would be to implement that package within plasTeX, but that just takes some effort to do completely generally and correctly.

aisejohan commented 7 years ago

Thanks for explaining. I have changed the LaTeX in tags 0176 and 018I and you can find it here. Hope this helps.

PatrickMassot commented 7 years ago

I'm sorry I'm late to the party. The thing you missed (and I missed for quite a long time) is how to have local macro definitions shadowing global ones. In xy.py, you need to have something like:

class xymatrix(Command):
  args = 'str'

  class EndRow(Command):
    """ End of a row """
    macroName = '\\'

Note how the EndRow class is nested inside the xymatrix class. This macro definition will be used only inside a xymatrix command. Since it takes no argument, you won't get a missing unit warning coming from misinterpreting \\ as admitting an optional vertical distance.

An unrelated subtlety is that \\ is not a valid python class name so we use a different class name and give the LaTeX name in the class attribute macroName.

pbelmans commented 7 years ago

This is a really good solution, thanks. And it could also be useful in fixing some other xy weirdness we could end up seeing.