Open dginev opened 3 months ago
This idea may warrant some extra discussion... If we want math mode constructs to still expand in this argument,such as:
\bibitem[Ex$\ddot{a}$mple(1899)]{...}
then Semiverbatim
is a bit misplaced - it deactivates the $
, but will expand the \ddot
in the natbib Expand($label)
call.
Maybe I should invent a new parameter type, which only deactivates the underscore? Thoughts welcome. That would look a bit more on the lines of:
DefParameterType('NatbibSemiVerbatim', sub {
# deactivated underscore
my $arg = $_[0]->readArg;
my @inactive = map {Equals($_, T_SUB) ? T_OTHER("_") : $_ } $arg->unlist;
return Tokens(@inactive); });
Edit: a slightly more direct version of a new parameter, which only deactivates underscore. A bit patchy possibly, but it is a little unclear which behavior natbib is aiming for exactly.
The general observation is that when a bare label is used in natbib's \bibitem[label]
- but its entry isn't cited - pdflatex won't emit an error. I believe this has to do with writing that data out via \NAT@wrout
which won't trigger expansion. Only after the written data is read back in (usually on a next call to pdflatex) could issues with underscore activation come up - and only if \cite
used that entry.
So, for now, I have decided to not change the parameter types, but instead guard LaTeXML's emulation which uses an explicit Expand()
call. Deactivating the underscores prior is sufficient.
I also added a test for this kind of tortured use case.
Your last observation almost gets it, I think. This label argument is getting expanded before writing to the aux file, but it is not digested until later, and only if the bibitem is cited. So that would mean that undefined macros or #
will cause immediate problems during latex's expansion of the label, but tokens that only affect digestion will pass through until they're cited - if ever! So, not just _
, but ^
, &
or even a single $
(or really any sequence that can't be digested) would be ignored by latex if not cited, but (currently) cause problems for LaTeXML. Moreover, _
itself isn't the problem; it's fine inside of, say \bibitem{foo$a_b$(1999)}{underscore}
.
Arguably these documents are in "Error", even if they don't cause errors, so I wonder how deep we should go. But if we were to try to fix it, I think we need to track where the label gets digested and use some kind of error-free digestion(?)
Good point, we should be approaching this even more generally. Having a dedicated parameter type that "postpones" the errors of certain Digest steps could be tricky... But maybe there is something there.
We have a natural place to anchor such a new parameter, at the DefConstructor
for \NAT@@wrout
. It may be worth playing around a bit with the example I had concocted. I'll investigate.
This is a minor change avoiding a needless error in natbib's \bibitem.
A minimal motivating example (that I could turn into a test) is:
Note the underscores in the
\bibitem
use, especially the one in the optional label[]
argument. These survive well under pdflatex -- and to my observations are largely ignored, at least in the specific document I am studying that uses this.With the current latexml master, this example produces two unfortunate errors of the kind:
The PR simply switches the offending argument to
Semiverbatim
in the natbib parser, deactivating the underscore's math behavior.