gentzkow / template_archive

19 stars 36 forks source link

Discuss image file format #75

Closed jc-cisneros closed 1 year ago

jc-cisneros commented 1 year ago

We were considering changing the default image format from ".pdf" to another file format. The main reason is that, in spite of being a format that is simple to work with, pdf files are not diffable in GitHub. @gentzkow mentioned that there are two main dimensions we care about: (i) diffability and (ii) vector rather than bitmap (which means plots always render at full resolution). We first considered shifting to .png, but that is a raster image file, so it does not fulfill (ii). It seems that .svg is a good candidate that fits all criteria. @meyer-carl further added that exporting SVG files should not be a problem for any of the programs that we use and using them on LyX/LaTeX should just require a proper set up.

@jmshapir @snairdesai @rcalvo12 @ew487 we would appreciate your thoughts on this.

jmshapir commented 1 year ago

@jc-cisneros thanks!

The approach we use is sketched here.

I think we considered SVG-only at some point. I don't remember all the details but I think one reason we didn't adopt is that this didn't play nicely with LyX.

Hope that's helpful!

jc-cisneros commented 1 year ago

Thanks for sharing that approach @jmshapir! I understand that the current preferred formats (as you recommend saving each image in two formats) are EPS and PNG. Is the logic to have one file more suitable for printing (EPS) and another format more versatile/lighter for web display? I think as of now both LyX and Overleaf can handle all of the file formats that have been considered (i.e., EPS, PNG, SVG, and PDF). Given that we rarely use photographs or detailed images, EPS might be too bulky to be the default format. If we adopt a two copies approach, I think PDF + SVG would be more suitable.

jmshapir commented 1 year ago

Thanks @jc-cisneros! Glad that was helpful.

I understand that the current preferred formats (as you recommend saving each image in two formats) are EPS and PNG. Is the logic to have one file more suitable for printing (EPS) and another format more versatile/lighter for web display?

Yep. We use EPS because it is vector and is diffable in ASCII. We use PNG because it is light for clipping etc. and can be visually diffed in github.

gentzkow commented 1 year ago

Thanks all.

@jmshapir If I'm understanding right, the only downside we've seen for SVG is that it didn't used to play well w/ Lyx, but that may no longer be true.

If so, it seems to me that SVG would then dominate -- it's also vector, it's also visually diffable in Github, and it's lighter / more web friendly. Can you see any reason not to switch to SVG as a standard?

gentzkow commented 1 year ago

(I'm not sure I see why we'd want SVG + PDF; @jc-cisneros let me know if I've missed something there.)

gentzkow commented 1 year ago

(@jc-cisneros If you haven't done so already, I'd suggest you do some extra testing to make sure we're right that SVG files work well in Lyx.)

jc-cisneros commented 1 year ago

@gentzkow @jmshapir let me share the results of testing svg files on both Overleaf and Lyx:

Overleaf

It worked as expected an only required adding \usepackage{svg} in the preamble and loading the image using \includesvg{chips_sold.svg}.

LyX

According to the LyX wiki post on the topic, it should work without any additional installation:

Note that as of version 2.0.0 LyX is capable of displaying SVGs without configuring Inkscape as converter (i.e. without doing the procedure above) but in some cases (e.g. SVG containing text, LyX running on Windows), the rendering will be buggy

Without changing anything in the .lyx code except for the new .svg filename, the image renders correctly in the preview:

Screen Shot 2023-02-01 at 10 44 15 AM

Nevertheless, the image is a bit too small when it gets rendered with pdflatex (but as the post mentioned, it does get rendered):

Screen Shot 2023-02-01 at 10 10 48 AM

Installing Inkscape and setting it up on LyX generates the same result. The solution is to explicitly specify the size of the image, which can be done by clicking on the properly rendered preview.

jc-cisneros commented 1 year ago

@gentzkow @jmshapir some further clarification on what is going on behind the scenes on the process in https://github.com/gentzkow/template/issues/75#issuecomment-1412555294:

Note that our final output is usually a paper or a slides deck in .pdf format. What that means is that independently of our format of choice, the images always get "translated" to pdf (that is precisely what the package mentioned in the previous post is doing). The LyX interface displays PNG files and that is why the svg image was rendered correctly. Displaying it on a pdf file implies using pdflatex, which apparently requires explicit dimensions to be provided.

These are my thoughts on all the options considered:

gentzkow commented 1 year ago

Thanks @jc-cisneros. Can you post an example for me I can walk through to see the exact steps required to add the Inkscape package and set the sizes?

Is the issue about rendering being buggy in Windows mentioned here just about the rendering inside Lyx? Or do we think it will be buggy when converting to PDF?

If this whole procedure seems too complicated or unreliable, it sounds like we may want to go back to @jmshapir's (PDF + PNG) solution.

jc-cisneros commented 1 year ago

@gentzkow I agree that @jmshapir's solution should be the baseline we are comparing any proposal to. Its major downside is that we are essentially duplicating the images to get the desired properties. Although this might be unimportant when analyzing total file size of a repository, having a large number of image files might make the repository hard to navigate and we have encountered situations in which the number of files is a relevant consideration. I think an alternative solution is preferable if it achieves the virtues of the baseline solution using only one file type: 1) Images are both diffable and vector based. 2) The process implied by a new solution is not significantly more complex than the baseline solution. 3) The new solution is reliable in the sense that the implied process works across OS or document processor of choice (i.e., LyX/Overleaf) and produce output of consistent quality. 4) (Long-term consideration) The file type is common and widely used. If a file type achieves all of the above and works for GSLab projects in the present, but there is a dominating general trend towards using other image formats, then points (2) and (3) might eventually fail.

If we evaluate the SVG proposal on these dimensions, my thoughts are the following: 1) This holds for SVG. 2) As mentioned in https://github.com/gentzkow/template/issues/75#issuecomment-1412555294, the process is simple in Overleaf, but might not be as simple on LyX. @gentzkow I will walk you through the more robust Inkscape process in a follow-up post. 3) @gentzkow your question on the "buggy" rendering is relevant here. My understanding is that if you use LyX out of the box (i.e., without installing Inkscape), then on some occasions the final rendering might have less quality (e.g., the position of the letters might be slightly off compared to the original SVG). 4) On my end, I have not seen ".svg" being widely used in research projects. If the number of researchers using git increases, we could expect a higher demand for diffable and vector-based image formats.

@snairdesai and I also discussed a possible alternative solution. During the development process, we use a diffable file type consistently. Whenever we need a release (submission/slides/website), we run a script that saves all the image files as pdf (or any other vector-based format).

gentzkow commented 1 year ago

Thanks @jc-cisneros. At this point, I'd vote that we stick to PNG+PDF per @jmshapir's original suggestion. The cost of extra storage is low, and the combination of extra frictions in Lyx/Latex and the possibility of bugs cropping up in the future seems to me like something we want to avoid. I also very much agree about (4) as a consideration. I'd vote we lock in this decision and update the lab manual accordingly.

jc-cisneros commented 1 year ago

Summary

In this issue we evaluated several image formats that could serve as the default file type in research projects. The decisions were the following:

cc. @gentzkow