IACR / latex

Latex classes for IACR publications. We will start with the new journal proposal.
8 stars 1 forks source link

Documentation for textabstract #232

Closed kmccurley closed 5 months ago

kmccurley commented 5 months ago

Dan Boneh @dabo has mentioned that the textabstract construction is clumsy, and he's right. There are several problems:

  1. since it does not appear in the PDF, authors may fail to update it. The display on publish.iacr.org should fix this, since the abstract is front and center when an author uploads their paper.
  2. it's not clear what LaTeX is allowed inside. For example, it says not to use macros, but some things like inline mathematics and \"u are allowed.
  3. it would be impossible to enforce any rules in iacrcc.cls itself. OK I get it that LaTeX is Turing-complete, but it's also frustration-complete as a programming environment.

We can use abstracts for two reasons:

  1. to show them on the web as HTML,
  2. to encode them in XML when reporting to crossref. (Note: some commercial publishers withhold them)

I ran a parser on a few hundred IACR submissions in order to see what macros authors use in abstracts and it found 981 variations of things that start with \ in the abstract. Some of them are obvious and common, like \begin{itemize} or \"u, but others are obscure or should not be in the abstract (e.g., \newcommand, \cite, \hspace, \qquad, \and, \iffalse, \paragraph). We need to establish some boundaries on what is allowed in textabstract, but it's impossible to enforce this in iacrcc.cls and should be handled downstream in python. This issue is intended to discuss how we expose the capabilities of the downstream system in the documentation for iacrdoc.

There are several ways to display inline and display mathematics in HTML, including MATHML, MathJax, and KaTeX. I am skeptical about MATHML, but there is a python converter that seems to do pretty well. chrome took until 2023 to restore support for MATHML. MathJax seems to be the most widely used. For this reason I propose that we leave anything in math mode alone when we store the textabstract in compilation.abstract. If we encounter bugs later on, we can fix them in any downstream conversions via MATHML or MathJax.

Paragraph (text) mode is another matter, since there are all sorts of things like \begin{itemize}, \begin{enumerate} \textcopyright, \"u, \"{u}, and {\"u} that are commonly used by authors. Some of these are problematic to convert to XML, but they may have representations like <ul> in HTML. The best converter I've found that can be used on fragments of LaTeX is the pylatexenc package, which produces text approximations of paragraph mode and can be easily modified to support things like \begin{itemize} that authors with to use in an abstract. Perhaps most importantly, pylatexenc handles all of the accents and symbols that have unicode equivalents, and can be instructed to leave anything that was mathmode intact without modification.

Unfortunately the list of things that are supported or unsupported is far too large to describe in the document (or in this issue). The current document says:

For final versions of papers, an additional text-only abstract is required. This abstract is contained in the textabstract environment, and should be just plain text (i.e., it should not contain macros). It will be used for indexing and production of HTML pages to describe the paper. As such, it is just as important as the classical abstract of a paper because it contains a textual summary that readers will use to decide if the paper is worth reading. The only difference is that the contents of the textabstract is constrained on what it may contain. The contents of this environment will be written to a file that ends with .abstract when you compile your LATEX, but will not be displayed in the final PDF except as metadata. Note that \begin{textabstract} must appear on a line by itself.

This is not really accurate, because some macros are allowed in math mode. In fact, it omits the discussion about inline and display mathematics being allowed (they are allowed in both XML and HTML). Moreover, some things like \begin{itemize} that are widely used in abstracts have an HTML representation and they are not mentioned. I suspect that we should do the following:

  1. mention that user-defined macros are not allowed.
  2. mention that inline and display mathematics is allowed.
  3. mention that some things like \begin{itemize} are allowed, but the complete definition of allowed constructions is too large to be included in this document. Authors will be shown whether their constructions are allowed when they submit their final versions.
jwbos commented 5 months ago

Please see 92179af7e2cc7d784463f86589ac7cd12456a361 for a first attempt.

jwbos commented 5 months ago

After review fixed in a549ef44f08a9a047d5c5cab386431d4fde5f37c.