gjtorikian / mtex2MML

A Bison grammar to convert TeX math into MathML.
23 stars 14 forks source link

Get rid of some crap #16

Closed gjtorikian closed 9 years ago

gjtorikian commented 9 years ago

This gets rid of some junk. It also properly frees some previously unallocated items.

gjtorikian commented 9 years ago

@mdiep If you're able to / still interested, I'd love to take a review :eyes: from you now. There are some missing implementations I want to get to, but I think they'll be simple enough to address after some of the C is looked at.

The only file to really look at is _parseextras.c. I'll try to explain why. mtex2MML uses a parser to determine how to transform a math string into MathML. This works great for most use cases, except one: arrays.

Consider the following array:

\begin{array}{c:c}
  \text{min} & 0 & 1 & 2 & 3\\
   \hline
   0 & 0 & 0 & 0 & 0 \\
   3 & 0 & 1 & 2 & 3
\end{array}

As the parser walks this array, here's what it understands:

Here's the problem: as the parser walks down the array definition, it needs to maintain a sense of how many rows and columns there are, and at the end, it needs to be able to go back and provide definitions for these in a MathML way. It's not so easy. Consider the column alignment, c:c. c means center, and : means dashed list, so this is saying, "The first column is centered, there is a dashed line, and then the second or more columns are also centered." The or more is key, because as we parse the array, we find five columns defined (\text{min} & 0 & 1 & 2 & 3--the & denotes a column break). In MathML, this means columnalign="center center" columnlines="dashed none". Two different attributes, one chunk of information. Likewise, we need to remember that each row is essentially borderless, except for the ones followed by \hline. That translates to rowlines="solid none". No other hline appears after the first one, so the second value of none means "none after the second row, and any other rows."

The original itex would never parse an array as defined above. In fact, itex used its own array notation, which explicitly defines array options like column alignment and row separators. But such an array is very basic for LaTex and MathJax to comprehend.

To fix this, I've written a set of functions in _parseextras.c. It essentially preserves the state of an environment as it walks down an array definition. So when a begin is detected, it starts to pay attention and pushing rows on a stack. When an end is detected, it starts popping items off the stack and remembering things like which row needs a border and how many columns are defined. This "row spacing" and "column alignment" information is then preserved in a different stack, so that when the parser hits an array, it pops off the row/column data and applies it as part of the MathML definition. For example, for a nested array like this one, you can find the construction of the data here. rowdata plucks information off the stack and turns it into MathML attributes.

Any feedback on any of this is much appreciated. In terms of code aesthetics, I'd like some guidance if possible. In terms of implementation, I'm pretty happy with it, as it works on the MathJax suite and doesn't appear to leak memory through Valgrind.

/cc @bkeepers @arfon in case you are interested too.

mdiep commented 9 years ago

I'm planning to take a look at this tomorrow. :+1:

gjtorikian commented 9 years ago

:sparkling_heart: :heart_eyes_cat: Feel free to ping me or even :jeans: if something doesn't make sense. I tried to leave sensible comments but I am sure naming conventions or other oddities have gone in as I'm not super versed in other people's C code. As well, if it helps, the code formatting convention is one I just took from Nokogiri.

arfon commented 9 years ago

This is so :sparkles: @gjtorikian.

mdiep commented 9 years ago

This PR looks good. I'm going to open a new PR with all your additions so I can leave line comments.