dginev / ar5iv

A web service offering HTML5 articles from arXiv.org as converted with latexml
https://ar5iv.org
MIT License
771 stars 20 forks source link

Improve article 2310.01693. PGFPLot produces catastrophic failure #402

Open manueldeprada opened 10 months ago

manueldeprada commented 10 months ago

This file: https://ar5iv.labs.arxiv.org/html/2310.01693 fails to render completely. I have seen some other issues with pgfplots, most related to a infinite loop. This does not seem the case, it looks a totally different error.

Exact location of issue fig/test.tex :

\pgfplotsset{style a/.style={error bars/.cd, y dir=both, y explicit}}
\pgfplotsset{
    /pgfplots/bar cycle list/.style={
        /pgfplots/cycle list={
            {blue,fill=blue!30!white,mark=none},
            {red,fill=red!30!white,mark=none},
            {brown!60!black,fill=brown!30!white,mark=none},
            {black,fill=black!30!white,mark=none},
            {cyan,fill=cyan!30!white,mark=none},
            {orange,fill=orange!30!white,mark=none},
        }, 
        },
        }

\begin{tikzpicture}
    \begin{axis} [
            ybar,
            symbolic x coords={Small, Medium, Large, XL},
            xtick=data,
            legend pos=south east,
            legend cell align=left,
            enlarge x limits=0.25,
            bar width=4pt,
            xlabel=Model size,
            ylabel=MAUVE,
            height=4cm,
            width=0.6\textwidth,
            legend pos=outer north east,
        ]
        % \addplot +[style a] table [col sep=comma, x=Size, y=Mean, y error=SEM] {data/test/Bottleneck-aware.csv};
        \addplot +[style a] table [col sep=comma, x=Size, y=Mean, y error=SEM] {data/test/BA_Eta.csv};
        \addplot +[style a] table [col sep=comma, x=Size, y=Mean, y error=SEM] {data/test/Eta.csv};
        \addplot +[style a] table [col sep=comma, x=Size, y=Mean, y error=SEM] {data/test/Nucleus.csv};
        \addplot +[style a] table [col sep=comma, x=Size, y=Mean, y error=SEM] {data/test/BA_Epsilon.csv};
        \addplot +[style a] table [col sep=comma, x=Size, y=Mean, y error=SEM] {data/test/Epsilon.csv};
        \legend{BA-$\eta$, $\eta$, Nucleus, BA-$\epsilon$, $\epsilon$}
    \end{axis}
\end{tikzpicture}

Problem details

The log outputs:

(Processing content /dev/shm/KRChnbu7gQ/fig/test.tex...
Package pgfplots info on input line 29: Using 'lua backend=false' for axis: x coord trafo unsupported.
PGFPlots: reading {"data/test/BA_Eta.csv"}
PGFPlots: reading {"data/test/Eta.csv"}
PGFPlots: reading {"data/test/Nucleus.csv"}
PGFPlots: reading {"data/test/BA_Epsilon.csv"}
PGFPlots: reading {"data/test/Epsilon.csv"}
Warning:perl:warn Argument "1Y9.374214214840683e1]" isn't numeric in numeric eq (==)
 at test.tex; line 37 col 0 - line 37 col 11
 at /usr/local/share/perl/5.34.0/LaTeXML/Package/pgfmath.code.tex.ltxml line 348, <$IN> line 37
 In Core::Definition::Expandable[\lx@pgfm... /usr/local/share/perl/5.34.0/LaTeXML/Package/pgfmath.code.tex.ltxml; line 364
Error:expected:Until: cs: Missing argument Until: cs: for Core::Definition::Expandable[\tikz@parse@coordinatesystem Until:(Until: cs:Until:)]
 at test.tex; line 39 col 0 - line 39 col 0
 Ended at test.tex; line 39 col 0 - line 39 col 0
 Next token is ...

@dginev , you seem to be tackling some pgfplots problems. Hope this helps! :)

dginev commented 10 months ago

Thanks for the report! As you say, this could be a useful test to keep in mind when working on latexml enhancements for pgfplots.

Two additions to the issue: forest.sty and linguex.sty are reported as missing packages in the log. Possibly those are separate enhancements from the pgfplots issue, but bookkeeping for completeness.

dginev commented 10 months ago

Quick update here: if we disable the customizations in pgfutil-common.tex.ltxml for \pgfutil@in@ and \ifpgfutil@in@ the reason for the Fatal error is addressed, and the article converts to a healthy Warning status.

There are some other issues to address (a curious --0.5 offender in pgfmath, and a vertical overflow in height for Figure 1), and the missing files I mentioned in the previous comment.

But we can resolve the Fatal with very little pain it looks like.

manueldeprada commented 10 months ago

Thanks for the hard work!! Is there any "nightly ar5iv" url where we can see the result of latest commits?

dginev commented 10 months ago

"nightly ar5iv" is not really the timescale we are working on - I am afraid we are not yet in a position to rerender articles on demand at all, let alone nightly. Also our team is small, so most nights there are no meaningful changes for the vast majority of articles.

The commits I mentioned are not yet in latexml proper, the PR there is a draft, which will likely get merged by the end of this month. We have a scheduled v0.8.8 release of latexml soon after, which we have been wrapping up since summer.

Once we have v0.8.8 I will reconvert the entire corpus and sync up with the live ar5iv site, then walk issues which have been resolved and close them. So you won't see the improvement until at least early 2024, as a full rerun on our current hardware takes about 3 weeks of runtime.

What can be done overnight - naturally - is to install a local latexml at a newly changed commit, then reconvert a specific article locally. That is one of the reasons I made ar5ivist, so that people who don't want to wait on ar5iv updates have an easy alternative course of action.

manueldeprada commented 10 months ago

"nightly ar5iv" is not really the timescale we are working on - I am afraid we are not yet in a position to rerender articles on demand at all, let alone nightly. Also our team is small, so most nights there are no meaningful changes for the vast majority of articles.

The commits I mentioned are not yet in latexml proper, the PR there is a draft, which will likely get merged by the end of this month. We have a scheduled v0.8.8 release of latexml soon after, which we have been wrapping up since summer.

Once we have v0.8.8 I will reconvert the entire corpus and sync up with the live ar5iv site, then walk issues which have been resolved and close them. So you won't see the improvement until at least early 2024, as a full rerun on our current hardware takes about 3 weeks of runtime.

What can be done overnight - naturally - is to install a local latexml at a newly changed commit, then reconvert a specific article locally. That is one of the reasons I made ar5ivist, so that people who don't want to wait on ar5iv updates have an easy alternative course of action.

Oh thanks! I will test my latex locally :). What a pity that a nice project like this does not have that much resources. This should be part of the future of arXiv!! Keep up the good work :)

dginev commented 10 months ago

This should be part of the future of arXiv!!

Well, that's the good news - it likely is! Keep your eyes open for an announcement soon, as ar5iv may be sunset when arXiv itself starts providing HTML in-house. Currently tracked at:

https://info.arxiv.org/about/accessible_HTML.html

On our end, we will keep pushing for 100% conversion coverage, since we are at 75% successful conversions today.

dginev commented 6 months ago

This article has healthy HTML with the latest ar5iv update to latexml v0.8.8.

One remaining issue is sizing of the resulting SVG figure in the Introduction, which has a pathologically large height of 22,000+

The rest of the article is still available, but one has to scroll for quite some time to reach it.