brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

Test fails w/ latest CTAN snapshot (babel/greek) #2175

Closed hpreusse closed 1 year ago

hpreusse commented 1 year ago

I checked out TeX Live another time at end of June and noticed that LaTeXML again fails to run a test using that TL snapshot.

t/81_babel.t ..............
1..7
ok 1 - use LaTeXML::Core;
ok 2 - t/babel/csquotes
ok 3 - t/babel/french
ok 4 - t/babel/german
not ok 5 - t/babel/greek

#   Failed test 't/babel/greek'
#   at /home/hille/latexml-0.8.8-pre/blib/lib/LaTeXML/Util/Test.pm line 120.
Wide character in print at /usr/share/perl/5.36/Test2/Formatter/TAP.pm line 125.
# Difference at line 41 for t/babel/greek
#       got : '<p>Ηερε´ς'
#  expected : '<p>Here’s'

Test Summary Report
-------------------
t/81_babel.t            (Wstat: 256 (exited 1) Tests: 7 Failed: 1)
  Failed test:  5
  Non-zero exit status: 1
Files=34, Tests=443, 524 wallclock secs ( 0.24 usr  0.12 sys + 492.41 cusr 29.91 csys = 522.68 CPU)
Result: FAIL
Failed 1/34 test programs. 1/443 subtests failed.
make: *** [Makefile:1848: test_dynamic] Error 255

This is reproducible with a git clone of LaTeXML I made this morning. The full build log can be seen here.

Sorry for the bad news! Thanks for help.

dginev commented 1 year ago

Thanks for the report!

I'd ask @brucemiller to take a look here, as he has a nice setup for the most recent texlive snapshots (which I admit is just a convenient excuse on my part, since I could make arrangements for that myself...)

dginev commented 1 year ago

In trying to avoid opening a second babel-related issue, let me also report something I am seeing on texlive 2022 with the latest LaTeXML of today.

A minimal load of english babel, as in:

\documentclass{article}
\usepackage[english]{babel}
\begin{document}
\end{document}

leads to ten repeated errors of the kind:

Warning:uninitialized:$ch Use of uninitialized value $ch  in hash element at at /home/deyan/perl5/lib/perl5/LaTeXML/Core/Mouth.pm line 163, <$IN> line 9
Warning:uninitialized:$_[1] Use of uninitialized value $_[1]  in hash element at at /home/deyan/perl5/lib/perl5/LaTeXML/Core/Mouth.pm line 279, <$IN> line 9
Warning:uninitialized:$_[1] Use of uninitialized value $_[1]  in hash element at at /home/deyan/perl5/lib/perl5/LaTeXML/Core/Mouth.pm line 279, <$IN> line 9
Warning:uninitialized:value Use of uninitialized value value in string eq at at /home/deyan/perl5/lib/perl5/LaTeXML/Core/Token.pm line 323, <$IN> line 9
Warning:uninitialized:value Use of uninitialized value value in string eq at at /home/deyan/perl5/lib/perl5/LaTeXML/Core/Token.pm line 323, <$IN> line 9
Warning:uninitialized:value Use of uninitialized value value in string eq at at /home/deyan/perl5/lib/perl5/LaTeXML/Core/Token.pm line 323, <$IN> line 9
Error:expected:Until:= Missing argument Until:= for Core::Definition::Expandable[\bbl@inistore@min Until:=Until:\@@] at babel-english.tex; line 13 col 0

for a total summary of

Conversion complete: 51 warnings; 10 errors
teepeemm commented 1 year ago

The 51 warnings and 10 errors come from acaa51d9bdf4ae582c57f527fe34580290efac34

But even rolling that back, I don't get the correct output from

\documentclass{article}
\usepackage[polutonikogreek,english]{babel}
\begin{document}
english

\selectlanguage{polutonikogreek}
greek

\selectlanguage{english}
english
\end{document}

Prior to the commit in question, I don't get any errors, but the final "english" is in greek.

dginev commented 1 year ago

I now have a 2023 texlive installed in parallel on my machine and double-checked if my claim that #2215 resolves the regression on texlive 2022 extends to the current release.

Sadly it doesn't - while the simple english babel load from my previous comment now succeeds, the greek.tex test still fails with the message reported in the original issue description.

There is a separate regression in t/structure/glossary:

# Difference at line 92 for t/structure/glossary
#       got : '            <p>Or, more loudly: Chop the <glossaryref inlist="main" key="cabbage">cabbage</glossaryref>, <glossaryref inlist="main" key="potato">potatoes</glossaryref> and <glossaryref inlist="main" key="carrot">carrots</glossaryref>.</p>'
#  expected : '            <p>Or, more loudly: Chop the <glossaryref inlist="main" key="cabbage">CABBAGE</glossaryref>, <glossaryref inlist="main" key="potato">POTATOES</glossaryref> and <glossaryref inlist="main" key="carrot">CARROTS</glossaryref>.</p>'

As well as the cautionary observation that make test took 21 minutes on my high-end desktop machine. Maybe we want to caution latexml users to stick to texlive 2022 for a little bit longer?

I will reopen here and check whether I can find a patch for the two failing tests.

xworld21 commented 1 year ago

Maybe we want to caution latexml users to stick to texlive 2022 for a little bit longer?

That might not be enough: apparently master fails on my TeX Live 2022 (nixpkgs), I also get

# Difference at line 41 for t/babel/greek
#       got : '<p>Ηερε´ς'
#  expected : '<p>Here’s'
dginev commented 1 year ago

It took way too long to troubleshoot the exact cause here, apologies, but it (appears to) come down to a single difference in the way we implement \cf@encoding in latexml, compared to latex.ltx.

Namely, there is a comparison implied by \bbl@switch in texlive 2023, which in the case of transitioning from Greek back to English will execute an \ifx \cf@encoding \BabelGreekPreviousFontEncoding. There is exactly on place in the expansion flow of our test, where the pdflatex run has \cf@encoding defined as macro:->LGR, while \BabelGreekPreviousFontEncoding is defined as macro:->OT1, and we proceed to take the \else case of the conditional - which sets the language back to English.

Meanwhile, latexml implements \cf@encoding as a Perl sub{}, which checks the current font directly. That means that a \let\foo\cf@encoding will always bind \foo to the same definition (pointing to that sub{}), whether the font has changed or not. Hence our test failing.


So it appears one approach to a solution is more precisely implementing \cf@encoding and its related internal latex.ltx macros (or waiting until we can load all of latex.ltx natively).

Another approach is to implement our own version of \bbl@switch or even more specifically, \BabelGreekRestoreFontEncoding.

Recording my current notes here, and I can follow with a PR once we choose a strategy.

P.S. Since the definition is dynamically constructed as babel executes, let me include it in the comment (logged via \tracingmacros=1):

\BabelGreekRestoreFontEncoding ->\ifx \cf@encoding \BabelGreekPreviousFontEncoding \else
 \let \encodingdefault \BabelGreekPreviousFontEncoding \fontencoding {\encodingdefault}
 \selectfont \fi