brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

respect \@sanitize guard for \index #2249

Closed dginev closed 11 months ago

dginev commented 1 year ago

Fixes ar5iv#393,

Minimal motivating example:

```tex \documentclass{article} \usepackage{makeidx} \makeindex \begin{document} \def\w{w} \begin{itemize} \item A\index{Choice Principle!({{\sf AC$^\{<\w}_\w$}})} \item B\index{Choice Principle!({{\sf BC$^{<\w}_\w$}})} \end{itemize} \printindex \end{document} ```

In a way this is an edge case problem in LaTeX documents which have an actual error in the argument of the \index command. The arXiv example is a particularly egregious case, where a big portion of a 70 page manuscript goes haywire due to a single malformed index argument.

pdflatex is largely immune to such problems, as the argument to \index is neutralized via \@sanitize and written in the auxiliary .idx file for second-stage processing. The makeidx binary can then (relatively quietly) veto malformed arguments, avoiding any errors in the main pdflatex workflow.

To this end, this PR adds the sanitization guard via a new parameter type - which then also retokenizes back to Plain catcodes. I then add the usual balanced-argument check, offering a warning in cases where the \index argument was ill-formed + discarding the entry. This matches pdflatex+makeidx "in spirit".

One tricky detail that was revealed by the tests is that if we have DefMacro bindings that expand into \index, there is some care needed for the tokens not to get mangled. There appear to be some subtle details around re-tokenizing spaces that I am not too certain about (they have to do with space skipping after a command sequence is completed). I wonder if I can implement the parameter type in a way that is more compatible for binding reuse.

Feedback welcome.

dginev commented 1 year ago

Alright, I managed to catch all the subtleties I got wrong I believe.

Double-checking arXiv:2006.01613 I am again seeing a healthy index, and all tests pass. Ready for review.