brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
863 stars 93 forks source link

JATS: journal metadata not empty #2346

Open castedo opened 2 months ago

castedo commented 2 months ago

When generating JATS XML, latexmlc 0.8.8 inserts not-yet-known for journal metadata:

<journal-id>not-yet-known</journal-id>
<issn>not-yet-known</issn>
...
<article-id>not-yet-known</article-id>

I see a number of downside to the behavior:

  1. I am not aware of any precedence for automated pipelines understanding not-yet-known. To the extent LaTeXML should be used in automated pipelines that minimize the need for human intervention, this value will probably appear somewhere as literally a journal or article repository with the name not-yet-known.
  2. Leaving these values blank is very likely to have the desired semantics to any reasonably robust reader of JATS. If they are empty, the value is not known. That parse is much more likely than downstream code having not-yet-known hard coded.
brucemiller commented 1 month ago

There was nothing particularly deep about not-yet-known; probably just filler due to some validation demanding something (the element, perhaps; or non-empty or ?). I'd thought empty could mean either unknown or that there isn't a journal-id (for example). I'm fine with leaving it empty. The main point will be synchronizing it to however the journal-id is encoded into the LaTeXML XML (if, when, and however it ends up there).