brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
957 stars 101 forks source link

Maximum number of warnings? #2308

Open dginev opened 9 months ago

dginev commented 9 months ago

The current record-holder for largest asset in ar5iv is arXiv:1501.02683, which has a ZIP of 647 MB in size.

To add insult to injury, the bundle contains a single cortex.log file, containing the latexml conversion logging information - and totaling a whopping 7.7 million lines.

As it turns out, those lines are almost entirely the same warning message, emitted in a busy loop until the timeout is reached (45 minutes). It reads:

Warning:expected:<number> Missing number, treated as zero
        at 1501.02683.tex; line 2441 col 0 - line 2441 col 12
        while processing \count@
        Next token is T_CS[\relax] ( == Core::Definition::Primitive[\relax]) more: \ifnum\count@<\value{proofcount}
\advance\count@\@ne\relax\expandafter\iterate\fi\let\iterate\relax
        In Core::Gullet[@0x559bfe9cba78] /dev/shm/hN2Sba2fTg/1501.02683.tex; 
 from line 2441 col 0 to line 2441 col 12

It is very clearly a pathological case of expansion gone wrong. But it suggests an idea for a new guard, similar to the "100 Error" guard that terminates execution. Maybe 10,000 Warnings?

Curious if @brucemiller finds this a good idea, and how many warnings could be considered a "healthy maximum".

dginev commented 9 months ago

Aside: The reason I stumbled on this issue is that ar5iv had some uptime challenges due to the web service running out of RAM while the site was being fully crawled. The example paper likely contributed to that.