JATS4R / JATS4R-Participant-Hub

The hub for all JATS4R meeting notes, examples, draft recommendations, documents, and issues.
http://jats4r.org
17 stars 20 forks source link

Other ways for documents to associate schemas #112

Closed Klortho closed 8 years ago

Klortho commented 9 years ago

A doctype declaration is the most common way for a document to declare "I conform to this version/flavor of JATS", but there are others.

See the PMC Tagging Guidelines, Associating Schemas, for a good writeup.

I'd like to propose that we adopt these recommendations for JATS4R, maybe with a couple of tweaks. I will get started writing them up in the general recommendations draft, and I'll report here whatever changes I think would be good to make, to make sure there's consensus.

Calling on you, my diligent task force, for comments and review: @jeffbeckncbi , @hubgit , @Nikos-Markantonatos , @pkra

Klortho commented 9 years ago

For now, I am taking out the part "we strongly prefer the DOCTYPE declaration for associating DTDs and the @schemaLocation or @noNamespaceSchemaLocation attributes for W3C XML Schema association", since that is explicitly about a PMC preference.

But, should JATS4R make the same recommendation? Should it be a warning, perhaps, if, for example, the DTD is specified with an <?xml-model?> PI? I can see pros and cons: on one hand, legacy XML tools probably don't recognize <?xml-model?>, thus those articles would be harder to reuse. On the other hand, specifying the document type by a mechanism other than a doctype-declaration does, in my opinion, move us down the road toward freedom from DTDs, which is good.

Klortho commented 9 years ago

I'm changing: "the content of [the @href pseudo-] attribute must be either the filename of the schema or the complete URL of the schema", to specify only the complete URL. This is inline with what we already decided regarding the SYSTEM identifiers on doctype declarations: complete URLs, please.

Nikos-Markantonatos commented 9 years ago

On the other hand, specifying the document type by a mechanism other than a doctype-declaration does, in my opinion, move us down the road toward freedom from DTDs, which is good.

Wait! How can we free ourselves from DTDs, when the first thing a consumer of the XML will look at is what DTD the XML in hand complies with. Whether it's JATS Blue or JATS Green, or whether it's 1.0 or 1.1d3 or whatever. Or whether this is an XML based on JATS but adapted by an organization to their own needs. I believe the DOCTYPE is the XML consumer's friend and based by persistence of the DTD model despite the emergence of newer representation schemas, I would think that the DOCTYPE is here to stay.

Klortho commented 9 years ago

I would think that the DOCTYPE is here to stay.

Sadly, I agree with this statement. I remember at the very first JATS-Con (was it five years ago already) having a discussion with a few people, suggesting that the JATS community really try to push people away from DTDs and towards Relax NG. But, it went nowhere.

Nevertheless, I really think that JATS4R should do whatever it can to facilitate this move, or, at the very least, not get in the way.

In theory, all of the different ways of associating schemas with instance documents are just as good as any other -- doctype declaration, XSD attributes on the root element, or the <?xml-model?> instruction. What the consumer wants to know is, what are the rules that the document conforms to, so that they can process the document correctly. So, any of those methods could specify JATS Blue or JATS Green, and the version, or, as you say, some customization.

The biggest problem with DTDs is that they make customizations too hard. Certain types of customizations should be very easy, IMO - for example, including elements from other namespaces in a document. With some of these later technologies, it's not only easier to customize, but it's also possible, for example, to specify that a document conforms to multiple schemas at the same time. This has great potential for data sharing and interchange, I think -- TaxPub is my favorite example of a mashup of different vocabularies.

But, most consumers of XML are only familiar with DTDs, I guess. I'd like to think, though, that JATS4R could stay agnostic about how a document identifies its schema, and perhaps we could do some blog posts or education pages on this topic.

Nikos-Markantonatos commented 9 years ago

The biggest problem with DTDs is that they make customizations too hard.

But most XML users do not care about that, because they only deal with XML and do not attempt to customize DTDs. If they ever need a customization (which is hardly ever the case), they relay the work to an XML expert anyway.

vincentml commented 9 years ago

The biggest problem with DTDs is that they make customizations too hard.

That's true of many DTDs, but I don't think it's true for the JATS DTD. The way the JATS DTD is designed it is very easy to customize. The JATS XSD is more difficult to customize. Making customizations in both the DTD and XSD simultaneously is a useful exercise.

Most people who use JATS are probably using a DTD. There was a survey about this at one of the JATS-Cons.

Klortho commented 8 years ago

We have approved the general recommendations here: https://docs.google.com/document/d/1rZkWgcIUbfYliC8Ql0f4Wr4jSUkp6Ssw2KoBD0UQezY/edit#.

A note about the discussion above, the way they are written now, they are agnostic about which scheme you use to reference the schema, but DTDs are listed first.