brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

protect listings.sty Semiverbatim arguments for dataname and toccaption #2189

Closed dginev closed 1 year ago

dginev commented 1 year ago

Fixes #2178 .

I am not sure if this covers all relevant cases for listings.sty, but the changes here fix the _ problems in the reported example.

It comes down to ensuring that Semiverbatim arguments are propagated through Semiverbatim contexts all the way upto the responsible absorbing Constructor.

brucemiller commented 1 year ago

Something's fishy; once the filename is tokenized, it shouldn't need all this "propogation"... but it does!?!?

xworld21 commented 1 year ago

This is a font encoding problem apparently: I can work around the issue by using \usepackage[T1]{fontenc}.

brucemiller commented 1 year ago

Ah, of course; it's the digestion of the arg that is also affected by being marked verbatim. But you shouldn't need the extra fontencoding, nor should the toctitle always be verbatim (eg if it comes from the name= keyval.

I think what's needed is a bit of swapping around: in \lstinputlisting, it is $name, not $file that should be passed lstProcessDisplay and assigned to LST@toctitle; However, if the name keyword is not given, $name defaults to $file and should be wrapped in a \texttt.

xworld21 commented 1 year ago

it is $name, not $file that should be passed lstProcessDisplay and assigned to LST@toctitle

Ahah, underscores in $name (as in \lstinputlisting[name=a_b]{...}) are also broken, but in a harmless way: latexml complains that _ can only appear in math mode. However with $name, the underscore remains the underscore regardless of the font encoding.

Digging deeper, if $name contains a dollar sign, latexml will output XMath, while listings does no such thing – a dollar remains a dollar.

brucemiller commented 1 year ago

That's bizarre; it doesn't format name as normal TeX? Why even bother with having the option? I guess that's a permutation of my previous suggestion, but also that the DefKeyVal('LST','name' should be Semiverbatim. weird...

xworld21 commented 1 year ago

That's bizarre; it doesn't format name as normal TeX?

Oh, indeed the name has special handling in listings.sty. From listings.dtx:

% \begin{lstkey}{name}
% \begin{macro}{\lstname}
% \begin{macro}{\lst@name}
% \begin{macro}{\lst@intname}
% Each pretty-printing command values |\lst@intname| before setting any keys.
%    \begin{macrocode}
\lst@Key{name}\relax{\def\lst@intname{#1}}
\lst@AddToHookExe{PreSet}{\global\let\lst@intname\@empty}
\lst@AddToHook{PreInit}{%
    \let\lst@arg\lst@intname \lst@ReplaceIn\lst@arg\lst@filenamerpl
    \global\let\lst@name\lst@arg \global\let\lstname\lst@name}
%    \end{macrocode}
% Use of |\lst@ReplaceIn| removes a bug first reported by
% \lsthelper{Magne~Rudshaug}{1998/01/09}{_ and list of listings}.
% Here is the replacement list.
%    \begin{macrocode}
\def\lst@filenamerpl{_\textunderscore $\textdollar -\textendash}
%    \end{macrocode} ^^A $
% \end{macro}
% \end{macro}
% \end{macro}
% \end{lstkey}

So it is actively replacing underscores, dollars and dashes specifically in names, although I don't understand at what stage (I see a PreInit hook?).

dginev commented 1 year ago

So it is actively replacing underscores, dollars and dashes specifically in names, although I don't understand at what stage (I see a PreInit hook?).

I see that \lst@MakeCaption uses that to neutralize the name:

\let\lst@arg\lst@intname \lst@ReplaceIn\lst@arg\lst@filenamerpl

Thanks for the suggested changes, I've filed an update.

But @brucemiller even with the reorganized name/file defaulting, unless I make the DefConstructor argument to be Semiverbatim, the regular _ digestion takes place and we see the odd characters. So something is still fishy? I removed a macro just to be sure there was no added indirection, and also double checked they're T_OTHER[_] - that is all fine. Feel free to take another look.

brucemiller commented 1 year ago

From a LaTeXML pov, it's more of a font-encoding question than a font question. But apparently the way listings does it is to specifically redefine _ and $. So, we'll be better off mimicing the weird approach. You might do that with a special ParameterType, which makes those two active and let's them to the text equivalents. You'd only need to use that parameter type in the 2 places you read the name (\lstinputlisting and the name keyword); once they're tokenized they'll be fine.

dginev commented 1 year ago

Might as well start fresh.