dginev / ar5iv

A web service offering HTML5 articles from arXiv.org as converted with latexml
https://ar5iv.org
MIT License
783 stars 20 forks source link

Improve article 1703.04200 #376

Closed yberreby closed 1 year ago

yberreby commented 1 year ago

Exact location of issue

The HTML5 version of the document is empty, except for "See pages 1-last of merged.pdf". The original is https://arxiv.org/abs/1703.04200 .

Problem details

The entire article is missing.

Desktop (please complete the following information) Irrelevant, but:

dginev commented 1 year ago

Thank you for the report!

Unfortunately, it looks like this article does not contain an honest TeX source. Instead, the version we have in ar5iv has the source:

\documentclass[a4paper]{article}
\pdfoutput=1
\usepackage{hyperref}
\hypersetup{
  pdfinfo={
    Title={Continual Learning Through Synaptic Intelligence},
    Author={Friedemann Zenke, Ben Poole, Surya Ganguli},
    Subject={continual learning},
    Keywords={continual learning, catastrophic forgetting, consolidation, online learning},
  }
}
\usepackage{pdfpages}

\begin{document}
\includepdf[fitpaper,pages=1-last]{merged.pdf}
\end{document}

As latexml converts a TeX source to HTML, we have no way to deal with a merged.pdf asset. We would need the original sources to improve.

In fact it is tempting to consider marking articles that have a single \includepdf directive as invalid in our build system. I will leave the issue open to consider that.

yberreby commented 1 year ago

I see, makes sense. Thanks for the feedback.

dginev commented 1 year ago

It looks like we'll keep such articles rendering as we do now for a bit longer, until we have a consensus filtering tactic in place.

Maybe we can get these filtered out of the arXiv source bundles, or something else...