JATS4R / JATS4R-Participant-Hub

The hub for all JATS4R meeting notes, examples, draft recommendations, documents, and issues.
http://jats4r.org
17 stars 20 forks source link

Rules for doctype declarations #103

Closed Klortho closed 8 years ago

Klortho commented 9 years ago

I'd like to propose a few recommendations for how documents self-identify what version of JATS they conform to (which is more-or-less independent of which version of JATS4R they conform to.)

There are several ways of self-identifying: doctype declarations, processing instructions, and XSD attributes.

_Recommendation 0: Any JATS4R-conformant document must use at least one of those ways of self-identifying._

The rest of this issue is specifically about the doctype declarations. When an article uses a doctype declaration:

_Recommendation 1: It must use the PUBLIC form, not the SYSTEM form_

So, for example, this is okay:

<!DOCTYPE article 
  PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN"
  "JATS-journalpublishing1.dtd">

But this is not:

<!DOCTYPE article SYSTEM "../JATS-journalpublishing1.dtd">

_Recommendation 2: It must have a valid JATS public identifier_

So this is no good:

<!DOCTYPE article PUBLIC "my//funky public identifier" "funky.dtd">

Conversely, I think we should specify that the system id does not matter, and must be ignored by bots or anything processing the file. This will allow users to put their DTDs in any location they want for local processing, without effecting re-usability.

Klortho commented 9 years ago

Regarding helping non-technical users get started with JATS, I think this is one area we should highlight. What is sorely missing from the JATS tag libraries is simple boilerplate for what the shell of a JATS XML document should look like. We could provide a page that users could cut-and-paste the boilerplate, to help get them started.

hubgit commented 9 years ago

For optimal re-use

<!DOCTYPE article PUBLIC 
  "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" 
  "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">

(without the newlines) should be the recommended way to identify the document type.

The most important thing is the PUBLIC identifier, but the SYSTEM URL also helps XML catalogs map URL prefixes to local directories.

Klortho commented 9 years ago

@hubgit , I am fine with this, and I like it, but I am worried that other folks (viz., @jeffbeckncbi ) at NLM might have an issue with it. There has been concern expressed in the past about excessive traffic to our servers, and getting into the problem that the W3C did with the HTML doctype decl.

I will write it up this way, and see if anybody pushes back.

Nikos-Markantonatos commented 8 years ago

The most commonly used alternative if we'd rather avoid the explicit URL to the NLM is:

<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">

hubgit commented 8 years ago

Note that the DTD file in JATS version 1.1 is also named JATS-journalpublishing1.dtd, so only the full URI is enough to distinguish between the versions.