In a way this is an edge case problem in LaTeX documents which have an actual error in the argument of the \index command. The arXiv example is a particularly egregious case, where a big portion of a 70 page manuscript goes haywire due to a single malformed index argument.
pdflatex is largely immune to such problems, as the argument to \index is neutralized via \@sanitize and written in the auxiliary .idx file for second-stage processing. The makeidx binary can then (relatively quietly) veto malformed arguments, avoiding any errors in the main pdflatex workflow.
To this end, this PR adds the sanitization guard via a new parameter type - which then also retokenizes back to Plain catcodes. I then add the usual balanced-argument check, offering a warning in cases where the \index argument was ill-formed + discarding the entry. This matches pdflatex+makeidx "in spirit".
One tricky detail that was revealed by the tests is that if we have DefMacro bindings that expand into \index, there is some care needed for the tokens not to get mangled. There appear to be some subtle details around re-tokenizing spaces that I am not too certain about (they have to do with space skipping after a command sequence is completed). I wonder if I can implement the parameter type in a way that is more compatible for binding reuse.
Fixes ar5iv#393,
Minimal motivating example:
In a way this is an edge case problem in LaTeX documents which have an actual error in the argument of the
\index
command. The arXiv example is a particularly egregious case, where a big portion of a 70 page manuscript goes haywire due to a single malformed index argument.pdflatex is largely immune to such problems, as the argument to
\index
is neutralized via\@sanitize
and written in the auxiliary.idx
file for second-stage processing. Themakeidx
binary can then (relatively quietly) veto malformed arguments, avoiding any errors in the main pdflatex workflow.To this end, this PR adds the sanitization guard via a new parameter type - which then also retokenizes back to Plain catcodes. I then add the usual balanced-argument check, offering a warning in cases where the
\index
argument was ill-formed + discarding the entry. This matches pdflatex+makeidx "in spirit".One tricky detail that was revealed by the tests is that if we have
DefMacro
bindings that expand into\index
, there is some care needed for the tokens not to get mangled. There appear to be some subtle details around re-tokenizing spaces that I am not too certain about (they have to do with space skipping after a command sequence is completed). I wonder if I can implement the parameter type in a way that is more compatible for binding reuse.Feedback welcome.