Open matthewfeickert opened 2 years ago
Ah, I see I took too long to open up a report, as Issue #3 already exists. I'll let @dginev decide if the added info here is worth keeping, or if this can just get closed as a duplicate and link the relevant info to Issue #3. :+1:
Thank you @matthewfeickert , love the enthusiasm of reporting this article so quickly that we even get a duplicate report in the same day!
I'll take a look
@matthewfeickert 's report is much more detailed than mine :-)
Based off this Tweet,
The conversion is directly from the TeX/LaTeX sources of the ~ 1.8 million arXiv articles that provided them.
I think that we (the authors) might be part of the problem, as if you download the source files currently on arXiv
$ curl -sL https://arxiv.org/e-print/2109.04981 -o arXiv-2109-04981v1.tar.gz
$ mkdir -p arXiv-2109-04981v1 && tar -xzvf arXiv-2109-04981v1.tar.gz -C arXiv-2109-04981v1
$ cd arXiv-2109-04981v1
$ sed -i.bak 's/FILENAME = draft/FILENAME = ms/g' Makefile # Correct first error in uploaded soure
$ make # get build errors
but if you download the source files I've attached here (submit_to_arXiv.tar.gz) (which I might delete in the future and edit this post)
$ curl -sLO https://github.com/dginev/ar5iv/files/7975741/submit_to_arXiv.tar.gz
$ tar -xzvf submit_to_arXiv.tar.gz
$ cd submit_to_arXiv
$ make
you'll get a ms.pdf
.
ar5iv
is written in Rust (nice) but if one build the project is there a way to run it locally against a local tarball to check?
Or does
directly from the TeX/LaTeX sources
really mean just the source files, and the build system and additional artifacts uploaded (like ms.bbl
) aren't taken into account at all?
@matthewfeickert sorry, today has been a hectic day over here - I'll offer some details and hopefully onboarding tomorrow.
There is a very specific invocation to use with the latest latexml on the source ZIP file I have prepared for each article. I've already done the preprocessing once and for all - so that you don't have to haggle with the tarballs.
For testing what is usually good enough is
$ wget https://ar5iv.org/source/2109.04981.zip
$ latexmlc 2109.04981.zip --dest=test.html \
--preload=[ids,nobibtex,localrawstyles]latexml.sty
--path=/path/to/arxmliv-bindings/bindings/ \
--nodefaultresources --css=https://cdn.jsdelivr.net/gh/dginev/ar5iv-css@0.7.2/css/ar5iv.min.css
where most of the development falls in improving the latexml processing of the document, and the extended bindings available in the arxmliv-bindings repository.
What happens when latexml is improved to process a new category of documents (usually identified by a missing package name, or missing macro name) is that I then update my build system workers and mark a rerun. It's a solo process right now.
Then, we get into a land of calm sanity - ar5iv is indeed a simple Rust web service, with just the minimal amount of touchup and care for a web deployment. But the sources are detached, and are not regenerated live -- there be dragons of all kinds.
today has been a hectic day over here - I'll offer some details and hopefully onboarding tomorrow.
Totally understandable and no rush at all! I was just brain dumping. :)
Thanks for already giving some advice here — I'll poke at this again later in the week. Congrats on an exciting day as well!
I quickly debugged here and it appears the spacing is specifically a problem in our latexml binding for tcolorbox.sty.
@matthewfeickert let me quickly discourage you from spending any time on finding fixes for this article -- you've leaned into quite a lot of packages, so unless you feel "intermediate" in working with latexml, the execution cascade will feel overwhelming. I've already made good partial progress, so I might as well recover the remainder to a "baseline" readable state on my own.
It won't have e.g. the minted highlights, and some other minor details, but that can follow on a second pass.
Edit: even more specifically, we may have a regression with expl3.sty / latex 3 interpretation, which would propagate into the article via tcolorbox.
Thanks for all the info @dginev. This is really great to see how much work you've done here and I truly appreciate the efforts and the high level of communication! :bow:
let me quickly discourage you from spending any time on finding fixes for this article -- you've leaned into quite a lot of packages, so unless you feel "intermediate" in working with latexml, the execution cascade will feel overwhelming.
I am at the level of "complete novice", so this is duly noted and appreciated! :)
One month is definitely not on the fast side, but we now have a version of 2109.04981 with all spaces back (i.e. with the latexml regression patched).
There is still some bad markup left, we need to further our support for tcolorbox.sty
. So I'll leave the issue open until that gets settled as well.
Let me know of any other glitches you encounter - authors are always the best eyes to inspect fidelity.
One month is definitely not on the fast side, but we now have a version of 2109.04981 with all spaces back (i.e. with the latexml regression patched).
Thanks so much @dginev! For the scope of the project you're undertaking I think 1 month is plenty fast. :rocket:
There is still some bad markup left, we need to further our support for
tcolorbox.sty
. So I'll leave the issue open until that gets settled as well.Let me know of any other glitches you encounter - authors are always the best eyes to inspect fidelity.
This already looks way better, so many thanks! You've already mentioned the tcolorbox.sty
issues with the glossary of terms, and I think the only thing that could be cleaned up a bit is the typesetting of the code examples in the minted environment.
I know that minted is a real PITA to try to work with, so I appreciate that might be out of scope for quite some time.
Other than that I think it looks great!
Exact location of issue Please provide a link to the source article, ideally pointing to the exact piece of content containing the issue. Our documents have "id" attributes on each logical element.
Hi. :wave: Thanks for making this project — really nice idea and work! The render for basically all of arXiv:2109.04981 (Publishing statistical models: Getting the most out of particle physics experiments) is broken. (I'm one of the authors and have the full LaTeX source so if you need me to give specifics on any part let me know — unfortunately, one of my colleagues uploaded a broken version of the source files, but I can send you it if needed (not sure if this is part of the problem).)
Problem details
The render fails throughout the document. It starts with printing some information from the source files
and then continues to fail to typset the document properly for the remainder. c.f. https://ar5iv.org/html/2109.04981#S1 for an example.
(Optional) Expected behavior A clear and concise description of what you think the preferred outcome should be.
For the document to render fully without errors.
Desktop (please complete the following information)
(Optional) Screenshots If applicable, add screenshots to help explain your problem.