dginev / ar5iv

A web service offering HTML5 articles from arXiv.org as converted with latexml
https://ar5iv.org
MIT License
783 stars 20 forks source link

Improve article 2308.01939 #383

Open dginev opened 1 year ago

dginev commented 1 year ago

Exact location of issue

Overall Size

Problem details

The article is stored on disk at 375MB in size. Loading the article comprises 239 separate requests to ar5iv, totaling 395 MB in downloaded assets.

This mostly comes from a large appendix of images.

We should check if there are improvements to latexml's image handling (and the magnification options exposed through latexml.sty), so as to minimize the needed memory for each bitmap.

dginev commented 1 year ago

Attaching also a size-sorted table of the largest article ZIPs in ar5iv from the 2308 update. Some of them could be useful related tests.

size ar5iv id
375M 2308.01939
220M 2308.06147
209M 2308.12968
167M 2308.04610
161M 2308.07157
151M 2308.00500
151M 2308.09711
132M 2308.11929
128M 2308.10554
118M 2308.00628
118M 2308.11917
112M 2308.01300
117M 2308.14761
107M 2308.00906
110M 2308.07314
110M 2308.01648
dginev commented 7 months ago

The reported article is now down from 375MB to 100MB in size, mostly due to using the magnify=1.8 and zoomout=1.8 options, reduced from =2

But this has rather broad and unpredictable impact, since it has also led to grainy loss of quality in other articles. Also, some images appear to be missing near the end of 2308.01939, in its appendix table.