brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
946 stars 101 forks source link

Recent citation regressions #959

Closed bfirsh closed 6 years ago

bfirsh commented 6 years ago

Sometime in e14e9f07d5fca8a124953ddb04a9abaeb65a618e..d18cdade09c34d4394623de3fa51695f749b4227 we're seeing some regressions for \cite.

The files are missing \documentclass{article}, but I'm seeing the same problem with that added at the top.

(Ref https://github.com/arxiv-vanity/engrafo/pull/292)

dginev commented 6 years ago

~Thanks for the report! Since this could well be executable-specific, could you let me know the exact commands you ran to reproduce?~

Can reproduce, taking a look

dginev commented 6 years ago

@brucemiller looks like switching from #refnum to #frefnum broke conversions that have a .bbl file?

change at:

https://github.com/brucemiller/LaTeXML/commit/c76866150fa8db87504566246e031ffb51c78d5c#diff-73d02df95f8342da0c0bc9a260d230d7R2906

The bbl content in question for this example is:

\begin{thebibliography}{28}
\bibitem[{Bloggs and Jones(2014)}]{bloggs2014}
Joe Bloggs and Phil Jones. 2014.
\newblock Compositional morphology for word representations and language modelling.
\newblock In \emph{Proceedings of ICML}.

\bibitem[{Cotterell and Sch{\"u}tze(2015)}]{cotterell2015morphological}
Ryan Cotterell and Hinrich Sch{\"u}tze. 2015.
\newblock Morphological word-embeddings.
\newblock In \emph{Proceedings of HLT-NAACL}.

\bibitem[{Foo and Bar(2020)}]{foobar2020}
Foo and Bar. 2020.
\newblock Just a title, not a source.
\end{thebibliography}

The generated intermediate XML has empty elements for refnum, here are the relevant snippets:

<cite class="ltx_citemacro_cite">[<bibref bibrefs="bloggs2014" separator="," show="Refnum" yyseparator=","/>]</cite>

and the bibitem:

      <bibitem key="bloggs2014" xml:id="bib.bibx1">
        <bibtag role="refnum"/>
        ...
brucemiller commented 6 years ago

Hmm... surprised any of this stuff got out already, but big set of patches coming soon which should fix this again.

dginev commented 6 years ago

"surprised any of this stuff got out already"

This is what happens when people use latexml in production :> Maybe we should up our test coverage?

bfirsh commented 6 years ago

Awesome, thanks!

One of my current projects is producing a set of tex files that cover as much functionality as I can find. It’ll then compares the HTML output with known good HTML as an integration test. Happy to work together on that, if you’d like.

brucemiller commented 6 years ago

The 2 links above gave 404's, so maybe the arrangement of test cases has changed. I have checked the bbl that @dginev posted and this should be fixed now. Thanks for the report & test cases!

bfirsh commented 6 years ago

Oops yes sorry should have sent absolute github URLs. Thanks for the fix!