Add recommended software LaTeX markup functionality to AASTeX

augustfly commented 8 years ago

This is a discussion issue.

We are looking to improve (and formalize) software markup in AASTeX articles. There are at least 2 packages currently in use by AAS Journal authors to mark up software inline to their LaTeX/AASTeX manuscripts:

minted : https://github.com/gpoore/minted
listings : https://en.wikibooks.org/wiki/LaTeX/Source_Code_Listings

An example of minted + AASTeX can be found in the published version of Bovy's "galpy" article and in the ms.in file in the galpy arXiv source:

arXiv source: http://arxiv.org/format/1412.3451v1
ADS page: https://ui.adsabs.harvard.edu/#abs/2015ApJS..216...29B/abstract
Final production article: http://dx.doi.org/10.1088/0067-0049/216/2/29

An example of listings + AASTeX can be found in the published version of VanderPlas & Ivezic's "Periodograms for Multiband Astronomical Time Series" and its arXiv source:

arXiv source: http://arxiv.org/format/1502.01344v1
ADS page: https://ui.adsabs.harvard.edu/#abs/2015ApJ...812...18V/abstract
Final production article: http://dx.doi.org/10.1088/0004-637X/812/1/18

Comments and suggestions are welcome.

philrosenfield commented 8 years ago

+1 for listings and -1 for minted.

That said, this is based on and uniformed 5 minute look (4m + 1m to read the easter egg in the VanderPlas+ source). With listings, apparently text is rendered as text. Minted seems to be rendered as an image (perhaps there is an option to render as text that I didn't see).

In other words, Minted seems to be author oriented (simple typesetting instead of e.g, screen capture) Listings seems to be reader oriented, with the idea they may want to use the code and could copy a snippet, or with someone's work, transfer it to a jupityr notebook and try it out. I didn't say author oriented because the style setting does not look like something I'd want to add to manuscript prep. If AASJournals sets the style template for each language, than it's reader and author oriented in my mind.

michaelaye commented 8 years ago

maybe the pygments settings could be piped/parsed into a lstset config block, so that the pygments prettyness automatically creates textblocks and not images? That possibly would be a 3rd approach on top of minted and listings, I guess.

augustfly commented 8 years ago

note that JATS 1.1d1 and later have code markup with reasonable rich attributes attached:

http://jats.nlm.nih.gov/archiving/tag-library/1.1d1/n-ty80.html %code

http://jats.nlm.nih.gov/archiving/tag-library/1.1d1/n-3j42.html %code-atts

screen shot 2016-05-19 at 8 47 43 pm

michaelaye commented 8 years ago

woosh (that sounds it makes when things pass above my head at very high velocity) ;) I might be a tech-oriented scientist, but not that much.. ;)

augustfly commented 8 years ago

@michaelaye when you publish and article, the manuscript gets converted from latex to XML for preservation and for conversion to PDF/HTML. JATS is one of the XML document (DTD) models we could use (we don't yet use it. we use NLM 2.3 i think).

So it would be most beneficial for actual code markup inserted by authors in their manuscripts to be preserved in the XML markup as actual richly attributed code instead of just being formatted to look pretty. We could use JATS version 1.1 draft 1 or later to do that.

you asked! :)

augustfly commented 8 years ago

To be clear there are times and places for Journals to assert style, but doing so on a per language basis is not something I think we should be doing.

At a higher level there are a bunch of questions we could ask:

Should inline code be colorized for readability? Colorizing code constitutes a major style change in how a journal treats "text", which has been to avoid any color based stylizing of text, equations, etc lest authors do so without any instruction or bound. FWIW the minted route works around this by rendering colorized text as an image.
Should inline code be treated as a float like a figure or a table? Floats currently have captions and alternate versions (high resolution, extendable online only tables, etc). Are there benefits to inline code having any of these features? For instance we could use a minted approach to render pretty colorized code as images, but require authors to submit the actual code snippet as downloadable text.

michaelaye commented 8 years ago

I'm not sure about your definition of inline code, I always thought inline code would be only something like this vs environment code:

like = this

I don't think that inline code (in above definition) needs coloring so much, but for showing code listings in an environment, I would just go with what the standards are in the field, and that's definitely to use code coloring. I find the fast recognition of what's going on that colors provide me indispensable. (I despise it every time a vim editor on a remote machine hasn't been set up to show me colored code.) And using a package like minted or listings would solve the journal from making the decision per language, wouldn't it?

I would strongly advise to present code only as floats. Especially code snippets are often discussed very close to the text itself, and I feel the flow would be too disrupted if the code snippet is placed somewhere else, which is unavoidable if it became a LaTex float. A full listing which is shown for completeness though and not in detail discussed might as well be a float, similar to a group of figures that are given as overview for example.

augustfly commented 8 years ago

erin ryan (@erinleeryan) suggested that:

snippets be linked to code repos instead of hosting code in the journal
code repo links come with some basic documentation to get users started.

astrofrog commented 8 years ago

I would suggest including the code as floats (not inline). Linking to an external repo for every single snippet seems overkill though.

augustfly commented 7 years ago

An update on a live example. Singer et al. (2016) used listings:

https://arxiv.org/abs/1605.04242 https://doi.org/10.3847/0067-0049/226/1/10

and it made it through our typesetting (sort of):

the arXiv LaTeX PDF contains little callouts that can be cut and pasted into ipython without a problem.
unlike the VanderPlas & Ivezic paper, they came through as little inline images (not figures!) instead of as tables. Thus they are not floats and do not count as floats in the article;
after going through our typesetting system it produced the strange outcome that the typeset PDF includes selectable text that can be cut and pasted into ipython, but we've replaced some characters with strange non-ASCII replacements that mess up iPython.
worse, the HTML copy inserts this material as little inline images that cannot be selected at all. At least when code is inserted in tables they have selectable text in all renditions.

When I get to the bottom of all this I'll keep adding notes here.

michaelaye commented 7 years ago

The doig link fails to open for me?

gregschwarz commented 7 years ago

Can you provide a specific article where the DOI does not resolve?

michaelaye commented 7 years ago

I'm referring to the document linked above by @augustfly.

augustfly commented 7 years ago

fixed @michaelaye -- the DOI is fine. the resolver should have been doi.org not doig.org! :-)

michaelaye commented 7 years ago

ah! i thought it must be a new supplementary server! ;)

michaelaye commented 7 years ago

*yuk* the look of those >>> prompts in the journal pdf, bleargh...

augustfly commented 7 years ago

Another listings example this time with a "table" like typesetting instead of the inline "images" in the last one.

https://arxiv.org/abs/1705.06184 https://doi.org/10.3847/1538-3881/aa73d7

The arXiv PDF is colorized "tabular" floats. The final article PDF look fine but lack colorization (no text color is a current AAS style enforcement -- imagine the abuse of this in any other part of the text).
The tables do not count as index (numbered) tables. But they do appear in the Table list in the HTML.
The code snippets are kind of extremely ugly in the HTML. But in the HTML article the ASCII versions of the tables are mostly useful. They are not properly indented though, so this is a fail.

So are tables the wrong way to do this? We seem capable of formatting code snippets any which wayin the PDF but really not fine the HTML. This has to be fixed.

augustfly commented 6 years ago

@astrofrog re your comment from 2 years ago. ;-) when you said "floats and not inline" did you mean as images with colorized text or as tables with plain text? Or were you thinking of how a "float" can be set off in the text like either a figure or a table.

michaelaye commented 6 years ago

I don't understand how text color could be abused if there are actually no author means to add color in the first place. The author would just provide the source code, and the system uses the previously agreed source code coloring mechanism, nothing else needs to added for this. Not coloring source code for that reason seems archaic, and kinda like an excuse. I like that the source code comes exactly where you need it in the text (so, not being a float, I guess?). But I guess if it's a science paper, a float as a reference to an example would be fine as well, as one rarely discusses code specifics in that detail. If, on the other hand, it's a technical paper on the exact methodologies to achieve certain things and how the results differ using, maybe, a different sequence of operations, then it becomes important to have the source code directly connected to the text where it is being discussed.

augustfly commented 6 years ago

@michaelaye you misunderstand. We could achieve colorized text in two ways: allow author text colors to pass through as is (which I think you understand as being a super bad idea) or by having a system like you describe. For the latter we would need to do some development on our Journals "system". This thread is a (very slow road) toward that by collecting author input toward that aim, while coupling it to understanding the actual tools authors use in marking up their latex.

augustfly commented 6 years ago

So to summarize we are at:

[X] preferred latex package -- listings
[ ] final code neutral listing colorization scheme (1)
[ ] code numberings and intertext links (like equations)?
[ ] descriptive text for accessibility (2)
[ ] any other?

(1): Singular scheme more than likely. Think about the complexities of standardization. You may want code specific color schemes, but what happens when you add a language? it because very difficult to maintain such a system.
(2): Whether the code is an image or a table of text, it requires descriptive text for readers with low vision or who are blind.

michaelaye commented 6 years ago

Regarding (1): It might be difficult to find or define a color scheme that fits all languages. OTOH, I'm not sure it's that overwhelming to maintain stylefiles per language. it would basically be a folder full of these: https://github.com/stuhlmueller/scheme-listings/blob/master/lstlang0.sty (as a Scheme example). Alternatively, you could constrain yourself to Python, Fortran and C++ and use defaults for others? In terms of consistency, the best way (although I don't know if that's feasible) might be to hook in to what GitHub is using. Because most ppl are now driven by their research sponsors to use GitHub anyway, so what ever source code color scheme is used there is VERY familiar to everyone, so it would be just best to use the same, right?

(2) I thought listings is ALWAYS text, not an image? But yes, descriptive text is useful. What is used for that for figures? Simply the caption text?

michaelaye commented 6 years ago

So, GH is using Rouge as the Jekyll syntax highlighter, which is compatible with Pygments, which, I had the feeling, kinda has developed as the standard for how source code is being colored.

bmorris3 commented 6 years ago

I'm adding my voice here to beg for copy/paste-able code examples within PDFs.

augustfly commented 6 years ago

@bmorris3 valuable point. the text in the VIP paper's PDF is selectable like the HTML, but the line numbers seem problematic to me.

When we standardize around listings we could enforce a "no line number" default to help the text be more cut-in-past-able.

augustfly commented 6 years ago

Adding the outcome of a recently published paper (by @bmorris3) -- http://iopscience.iop.org/article/10.3847/1538-3881/aaa47e/meta

The manuscript was marked up with listings and during production we discussed two possible options for the code markup in this article: inline, selectable but black and white text or colorized figures for the code. We ended up retypesetting it as the latter.

Those are the two extant options for authors, but we are in agreement that we should develop a project to enabled colorized selectable code. This will need to be a development project with IOP as we will need to iron out how the typesetting/production team recognizes and standardizes "code" in articles.

augustfly commented 4 years ago

Whatever the chosen outcome for authors, we also need to settle the issue with the publisher. Examples related to recent typesetting / IOP production safus:

How to treat scripts as literal text and prevent typesetting, automatic (destruction) of spacing/indents, incorrect character encoding of quotes etc;
How to mark up software in the XML version of the article.

cc @chrislintott

mpound commented 2 years ago

One issue to bear in mind is that the twocolumn style imposes strong restrictions on how the author can show a code example and makes it less readable. Appendices are nice because they are one column format. In the manuscript I am working on, I have some listings that are 30-50 lines long (including comments and blank lines). Having to wrap those in two column format looks terrible and makes them even longer. The journal might want a rule of thumb that code examples longer than ~10 lines go to the appendix.

mpound commented 2 years ago

Also if the author provides a DOI for all listings, then making code selectable in the journal becomes less important, i.e., a nice feature but not imperative. To reduce the burden on authors, this could be one DOI that points to a tarball or all listings in one file.

zingale commented 2 years ago

I think it would be useful for software papers to have a way to show code and output side by side in a figure. For instance, I'd like to do something like this:

code_fig

which I am doing in AASTeX via:

\begin{figure}[t]
\begin{minipage}[b]{0.5\linewidth}
\begin{lstlisting}
comp = pyna.Composition(pynet.get_nuclei())
comp.set_solar_like()
density = 150
temperature = 2.e7
state = (density, temperature, comp)
srates = pynet.find_unimportant_rates([state],
                                      1.e-20)
pynet.remove_rates(srates)
\end{lstlisting}
\end{minipage}
\includegraphics[width=0.48\linewidth]{cno_filtered}
\caption{\label{fig:cno_filtered} Filtering the \ratecollection\ from Figure~\ref{fig:cno_first} by removing unimportant rates.}
\end{figure}

A recommended / official way of accomplishing this through an AASTeX environment would be very useful for software papers.

michaelaye commented 2 years ago

i think LaTeX in general is being overtaken by Jupyterbook and Quarto in this area, where one can easily hide or show the code belonging to figures. Plus, one can even export the respective markdown (or "super-markdown" MyST) to Latex, if required.

zingale commented 2 years ago

yes, I use Jupyterbook a lot. But that is a different purpose than this. If I am writing a scientific paper describing a library, I want to describe the API and for that to be in print in ApJ, so it is nice to show the code side by side with the output so we can discuss design decisions in the paper.

augustfly commented 1 year ago

Jumping back in here bc well @zingale knows. I've had another conversation with the showyourwork folks who asked about adding "software behind the figure" in parallel to our "data behind the figure" model. Is your idea @zingale that we formalize these minipages as a feature of AASTeX? There are broader questions about colorization (see the email I'm about to send to Smith Clark about the pynuastro code in the accepted article). What I like about your example is that the code snippet is carried along with the journal, while the "deep link" model of showyourwork keeps getting lost [I can define the different ways it can get lost if you want].

zingale commented 1 year ago

I would like to have the minipages formalized as part of AASTeX. Because I think that reading the code directly in the article is sometimes important, when discussing design decisions. If the code is hidden behind a link, then things get lost (I was also curious about how our manuscript will be changed in post-production ;)

augustfly commented 1 year ago

Tracking details re listings

Given that we haven't settled on a solution for colorized code, I want to put down a headsup for users of listings:

While appendices can be set in single column (see @mpound's comments above), the full text is set as two column in PDF. If you want colorized code then you will need to compile listings documents as two column to get the right sized color-boxes:

\documentclass[twocolumn]{aastex631}

by default the appendices are left as one-column. The next important step is to check that listings code blocks don't break across columns or pages. Please don't do this:

\begin{lstlisting}[float=t]

because your code will float awawy from the text that mentions it. The best solution I found is to wrap the code block in a minipage. This keeps the code block from floating or breaking:

\begin{minipage}{\linewidth}
\begin{lstlisting}

augustfly commented 1 year ago

@zingale yeah, I've slowed the aritcle's production down a little bit trying to figure out the underlying problem with the code boxes. I'll layout the options in the email I send. Thanks for replying quickly that the minipage side-by-side solution is cool!

zingale commented 1 year ago

thanks. We did take care that our code snippets in lstlistings were formatted to work in twocolumn mode.

AASJournals / AASTeX60

Add recommended software LaTeX markup functionality to AASTeX #19