Vitaliy-1 / docxConverter

Plugin for OJS 3 that parses DOCX and converts it to JATS XML format
GNU General Public License v2.0
21 stars 11 forks source link

Inconsistent xml header and version-tagset dubts. #3

Closed marcbria closed 3 years ago

marcbria commented 5 years ago

Hi Vitaliy,

After some testing today I found the transformation the xml header is inconsistent.

IMHO, it won't be a problem for generation and publishing (tool works fine), but could be in future with systems interconnection and data recovery.

While is declared as JATS 1.0, it's validated against a 1.1 DTD schema:

<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving DTD v1.0 20120330//EN" "https://jats.nlm.nih.gov/archiving/1.1/JATS-archivearticle1.dtd">

The JATS sample output in your repo have the same problem: https://github.com/Vitaliy-1/docxToJats/blob/d347122cf84c65698a9c0489e96b6dd62658eb4c/samples/output/test_jats.xml

But if use 1.1 DTD, it should be also announced as 1.1.

More than this, I think we also have a problem with the tag set... I mean, it's true that Archiving is the most permissive model, but we should be creating JATS with Publishing tag set, isn't it?

So, if I'm understanding it well, docxConverter should use the same xml header we have in this example in the PeerJ repo:

<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">">

Source: https://github.com/PeerJ/jats-conversion/blob/master/tests/example-jats-1-1-software.xml

But in January of 2019, NISO released JATS 1.2 (that is now the current standard) so instead of working and testing 1.1, I'm asking myself if wouldn't be wiser to start moving all the tools to JATS 1.2.

I didn't check yet what happens with Texture (keeps header untouched? same issue?

More I read about JATS, more I have this feeling:

imagen

Cheers, m.

marcbria commented 5 years ago

Just for your information: I tested texture and Substance is generating JATS 1.0 with this header:

<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving DTD v1.0 20120330//EN" "JATS-journalarchiving.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0">

It's old, but it's consistent. :-)

The funny part it's wile PKP texture plugin (updated 5 month ago) is 1.0, substance repo say they are working with JATS 1.1 "Green" tagset: https://github.com/substance/texture/

While in their texture-plugin-jats it's said to be JATS 1.2: https://github.com/substance/texture-plugin-jats

I will ask Dulip about this because I'm lost between versions and repos. :-)

Anyway, the questions for me is still the same: Shouldn't we all move to JATS 1.2 Publishing (Blue) especification?

Cheers, m.

Vitaliy-1 commented 5 years ago

Hi @Marc,

Thanks, I'll check the declaration. It's not a problem to correct this, but I also need to check if it would be supported by the Texture.

Texture uses its own subset of JATS XML and doesn't comply with full standard specifications. It makes compatibility more complicated.

I can't find much difference between 1.1 and 1.2 and it's backward compatible, so don't see problems from a DOCX Converter side.

However, I can't say that it's a good idea to support all JATS XML standard, it's huge, allows containing mixed nested elements and allows XML to be unstructured, e.g., references.