brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
932 stars 99 forks source link

Theorem title in Jats-XML #1503

Closed rgieseke closed 3 years ago

rgieseke commented 3 years ago

When i convert a theorem to JATS-XML the title is not appearing in the XML.

For a theorem environment defined like

test.tex

\documentclass{article}
\newtheorem{theorem}{Theorem}[section]

\begin{document}
\begin{theorem}
Let f be a function.
\end{theorem}
\end{document}

I get with latexml test.tex:

<?xml version="1.0" encoding="UTF-8"?>
<?latexml searchpaths="/home/robert/Work/ems/tex-json"?>
<?latexml class="article"?>
<?latexml RelaxNGSchema="LaTeXML"?>
<document xmlns="http://dlmf.nist.gov/LaTeXML">
  <resource src="LaTeXML.css" type="text/css"/>
  <resource src="ltx-article.css" type="text/css"/>
  <theorem class="ltx_theorem_theorem" inlist="thm theorem:theorem" xml:id="S0.Thmtheorem1">
    <tags>
      <tag>Theorem 0.1</tag>
      <tag role="refnum">0.1</tag>
      <tag role="typerefnum">Theorem 0.1</tag>
    </tags>
    <title class="ltx_runin"><tag><text font="bold">Theorem 0.1</text></tag></title>
    <para xml:id="S0.Thmtheorem1.p1">
      <p><text font="italic">Let f be a function.</text></p>
    </para>
  </theorem>
</document>

When i convert to JATS-XML with latexmlc test.tex --dest=test.jats.xml --pmml --stylesheet=LaTeXML-jats.xsl:

<?xml version="1.0"?>
<article>
  <front>
    <article-meta>
      <contrib-group/>
      <!-- The element theorem with attributes
    class=ltx_theorem_theoreminlist=thm theorem:theoremxml:id=S0.Thmtheorem1fragid=S0.Thmtheorem1 
      is currently not supported for the front matter.
    -->
    </article-meta>
  </front>
  <body>
    <statement id="S0.Thmtheorem1">
      <title/>
      <p id="S0.Thmtheorem1.p1">
        <italic>Let f be a function.</italic>
      </p>
    </statement>
  </body>
  <back>
    <!-- The element theorem with attributes
    class=ltx_theorem_theoreminlist=thm theorem:theoremxml:id=S0.Thmtheorem1fragid=S0.Thmtheorem1 
      is currently not supported for the back matter
    -->
    <app-group/>
  </back>
</article>

It seems the conversion should happen here, but is not picking up the title:

<xsl:template match="ltx:theorem/ltx:title">
    <title>
      <xsl:apply-templates select="@*|node()"/>
    </title>
  </xsl:template>
rgieseke commented 3 years ago

The following seems to work

<xsl:template match="ltx:theorem/ltx:title">
    <title>
      <xsl:apply-templates select="@*|node()"/>
    </title>
  </xsl:template>

  <xsl:template match="ltx:title[@class='ltx_runin']/ltx:tag">
    <xsl:apply-templates select="@*|node()"/>
  </xsl:template>

Gives (including a bold elem):

    <statement id="Thmtheoremx1">
      <title><bold>My theorem</bold>.</title>
brucemiller commented 3 years ago

Is there anything in JATS specifically to distinguish the reference number of a section or theorem (ltx:tag) from the title proper? For example in "1. Introduction", should the "1" be wrapped or marked up separately? The current code throws away the reference number; and your "My Theorem", for better or worse, ends up considered to be a reference number, and so also gets omitted.

If not, I have a patch that includes the ltx:tag in the title. I don't think you really want to single out ltx_runin, that's really just formatting/styling. But the patch also now includes the number in section headings as well. Which I actually suspect is what you want.

rgieseke commented 3 years ago

It doesn't seem like it, the example in the JATS tag library looks like

<p>Industrial buyers categorise foreign countries
according to their level of technological achievement
and subsequently differentiate their perceptions of
these countries accordingly. ... The following
hypothesis is posited:
<statement><label>Hypothesis 1</label>
<p>Buyer preferences for companies are influenced 
by factors extrinsic to the firm attributable to, and
determined by, country-of-origin effects.</p>
</statement>
</p>

https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/statement.html

It kind of seems that label and title are not clearly distinguished.

The id Document Internal Identifier could probably contain information about a numerical value. Not sure about whether there is any good way to wrap it somehow as suggested by @dginev https://github.com/brucemiller/LaTeXML/pull/1516#pullrequestreview-638231948)

I don't think you really want to single out ltx_runin Yeah, that was just the first workaround i could get work ... i've tested your patch in #1516, works well!