elifesciences / sciencebeam-pipelines

A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools together to generate a full XML document. It is now mainly used for evaluation purpose of external tools.
MIT License
1 stars 0 forks source link

JATS conversion missing DOCTYPE #3

Closed LeonardEyer closed 2 years ago

LeonardEyer commented 4 years ago

Using the xslt it is unclear which JATS version is used. Also no doctype information is present.

Example minimal article structure (Wikipedia):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN"
         "JATS-journalpublishing1.dtd"
>
<article dtd-version="1.0" article-type="article" specific-use="migrated"
 xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" 
>
  <front>...</front>
  <body>...</body>
  <back>...</back>
</article>
de-code commented 4 years ago

Thank you for raising the issue. Apologies, didn't see the notification for it.

It is currently using DAR JATS as the reference (as the demo is showing it in Texture). This will probably change. The referenced repo links to JATS 1.1.

So I suppose it could be:

<!DOCTYPE article
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "https://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">

(The relevant XSLT)