brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
961 stars 101 forks source link

detect raw PDF inclusion documents and mark them as invalid inputs #2191

Closed dginev closed 1 year ago

dginev commented 1 year ago

This PR marks another class of arXiv inputs as invalid, following a user report from today at ar5iv#376.

There is a range of special case documents that are erroneously included in the arXiv source bundles, which precompile a local PDF and re-include it via \includepdf to satisfy the arXiv submission requirement.

I think it is an honest "invalid" marker that a document which only has a single use of that macro isn't meant for latexml conversion. Suggestions welcome if there is a better test than the one I tried - I simply checked at the end of the document construction whether we have more than one content node under ltx:document, when \includepdf has been used.

brucemiller commented 1 year ago

Was this intended to be a PR for the arXiv setup? It seems quite reasonable to consider such files invalid for coretex, and fairly pointless to convert with latexml in general, but throwing a fatal error seems kinda gratuitous.

dginev commented 1 year ago

The PR was filed in the intended repository.