JATS4R / JATS4R-Participant-Hub

The hub for all JATS4R meeting notes, examples, draft recommendations, documents, and issues.
http://jats4r.org
17 stars 20 forks source link

Use of dtd internal subset, and entities #104

Closed Klortho closed 8 years ago

Klortho commented 9 years ago

See also issue #1, which is specifically about character entity references.

More generally, there are lots of different kinds of entities that can be used in XML. I think it would be nice if we could make it an error to use any external entity in any JATS4R document. Let me explain by example, including the CERs already discussed.

Character entity references

These are the things like © that are defined in the JATS DTD. I'm proposing we make these a warning.

Internal entities

These are defined in the internal subset of the document type. For example

<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN"
  "JATS-journalpublishing1.dtd" 
[
  <!ENTITY my_copy "©">
]>

<article> ... &mycopy; ... </article>

I think these should be allowed, because any good XML parser will not have any problem with them. Although, I don't know if they work in the browsers' parsers -- we should check that.

External entities

I think we should disallow the use of any external entities. For example:

TBD.

hubgit commented 9 years ago

For optimal re-use, the recommendation should be to not use any named entities.

If internal named entities are needed to make up for missing Unicode characters then they obviously have to be there, but with the caveat that they may not be accessible in some contexts (e.g. web browsers using their native XML parser).

Klortho commented 8 years ago

We have made it an error if any named entities are used (other than &lt;, &gt;, &apos;, &quot;, and &amp;. I think, that's probably best, and easiest for producers and consumers, rather than trying to deal with the technicalities of internal subsets.