brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
917 stars 97 forks source link

latexmlpost ends abruptly when gif or pdf images are used #2077

Open goska opened 1 year ago

goska commented 1 year ago

Do you know what may cause latexmlpost conversion to HTML fail abruptly when gif or pdf images are used as source images in Figure environment? The process ends abruptly with message: "Warning:perl:warn Exception 410: Unrecognized attribute (option) at /usr/local/share/perl5/LaTeXML/Util/Image.pm line 483

Conversion works with jpg and png images.

I am testing conversion with LaTeXML 0.8.7 of test files, which worked OK with previous versions of LaTeXML, including 0.8.6. I am working with LaTeXML installed for me by server admins on a new server. Conversion with LaTeXML 0.8.2 running on the old server (to be decommissioned) of the same files works. Conversion with LaTeXML 0.8.5 and 0.8.6 on my personal MacBook worked too. I am using two-step conversion LaTeX -> XML -> HTML using latexmlpost option --format=html5.

dginev commented 1 year ago

For simple png and jpg images latexml has the option to just pass them along untouched in the final HTML (as with gif and svg), but for PDF there needs to be a dedicated conversion step to a web image format, which typically happens via imagemagick.

I quickly experimented with:

\documentclass{article}
\usepackage{graphicx}
\begin{document}

\includegraphics[width=0.5\textwidth]{test.gif}

\includegraphics[width=0.5\textwidth]{test.pdf}

\end{document}

On a gif and pdf image that I grabbed from an image search, and things look operational with latexml 0.8.7, using the imagemagick dependencies described in the get latexml page.

Can you still reproduce the problem via such test file @goska ? Would you be open to sharing the full log of post-processing?

In addition, here is a report of the underlying imagemagick installation on my machine using convert --version:

$ convert --version
Version: ImageMagick 6.9.11-60 Q16 x86_64 2021-01-25 https://imagemagick.org
Copyright: (C) 1999-2021 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC Modules OpenMP(4.5) 
Delegates (built-in): bzlib djvu fftw fontconfig freetype heic jbig jng jp2 jpeg lcms lqr ltdl lzma openexr pangocairo png tiff webp wmf x xml zlib

Does that look comparable to your setup?

dginev commented 1 year ago

installed for me by server admins on a new server

I spotted this part of your report too late. There is a very relevant imagemagick security issue on older servers, which led them to completely disable PDF support.

See the full details here: #1216

I even have the workaround embedded in the recent Dockerfile I made for the ar5iv-style conversion. You can reuse them from: https://github.com/dginev/ar5ivist/blob/main/Dockerfile#L74-L82

Let me know if that happened to be the issue on that particular machine.

goska commented 1 year ago

Thank you, @dginev. My conversion works on old server, but it fails on the new server with up-to-date OS (I think it's Red Hat, I'll check). It also works on a Mac with a recent OS under LaTeXML 0.8.6. I'll test conversion with the simple file you have suggested, and check what response I get to convert command.

goska commented 1 year ago

I have tested conversion of the simple document, based on your test document above with LaTeXML 0.8.7 (the new server), and it failed with the same error: Warning:perl:warn Exception 410: Unrecognized attribute (option) at /usr/local/share/perl5/LaTeXML/Util/Image.pm line 483

dginev commented 1 year ago

@goska sounds like checking for the security policy is the next obvious thing to do. On an Ubuntu machine that is:

$ cat /etc/ImageMagick-6/policy.xml |grep "PDF"

You may see lines such as:

<policy domain="module" rights="none" pattern="{PS,PDF,XPS}" />
<policy domain="coder" rights="none" pattern="PDF" />

which then needs to be disabled so that PDF processing is allowed, either by removing that line or commenting it out:

<!-- <policy domain="module" rights="none" pattern="{PS,PDF,XPS}" /> -->
<!-- <policy domain="coder" rights="none" pattern="PDF" /> -->

Naturaly, you will need root access to modify that file. If that solves the problem, then this issue will turn out to be a duplicate of #1216 .

goska commented 1 year ago

I'll ask people managing the server if changing security policy is acceptable.

xworld21 commented 1 year ago

Partially off-topic: I'd recommend converting PDF outside of latexmlpost, using e.g. dvisvgm. latexmlpost should be smart enough to detect files with the same name but different extension, and prefer SVG over PDF. Or at least, I believe it works like that at the moment. This is assuming you want to convert PDF to SVG, which is normally the best way to go unless you have specific compatibility issues.

On top of the security holes, ImageMagick does a terrible job at converting vector graphics to vector graphics: it rasterises the image, then traces it, with very poor results. It should be absolutely avoided for any PDF/EPS -> SVG conversion.

goska commented 1 year ago

Thanks for advice, @xworld21. @dginev, the list of "delegates" installed on my new server is the same as yours, so the issues with conversion of pdf images are most likely due to the security policy settings. I am not sure what may be causing problems with conversion of gif images.