TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

Usage of `<dataRef>`+`<dataSpec>` and `<macroRef>`+`<macroSpec>` #2426

Closed martindholmes closed 1 year ago

martindholmes commented 1 year ago

The ATOP team notes that:

We suggest that it would be useful to:

This would not catch all potential errors, of course, because ODD-chaining could still result in elements ending up in the content model of attributes, but the ODD processor can catch these. But the most straightforward cases would be caught by Schematron.

ebeshero commented 1 year ago

From Council discussion: Consider whether we should prevent macroRef from being used in <attDef>? That would prevent the other way of putting an element inside an attribute.

joeytakeda commented 1 year ago

Thinking about this the other way: what should a dataSpec allow as a child of content? Right now, it allows:

I would suggest that nothing other than <dataRef> makes sense; the only reason I can see a dataRef being used in an element would be to specify a <valList> for an element's content (which is handled via dataSpec/valList anyway).

sydb commented 1 year ago

@ebeshero — At first blush keeping <macroRef> out of <attDef> seems like a good idea. But I do not think it is allowed, anyway.

@joeytakeda — Good thought experiment. But I claim <alternate>, <sequence>, <textNode>, and even <empty> are useful. Imagine, e.g., that I want a datatype that gives me latitude & longitude with leading zeroes in degrees to 2 decimal places and elevation in meters:

        <dataSpec ident="lle">
          <content>
            <sequence>
              <dataRef name="float" restriction="-?[0-9][0-9][0-9]\.[0-9][0-9]"/>
              <dataRef name="float" restriction="-?[0-9][0-9][0-9]\.[0-9][0-9]"/>
              <dataRef key="teidata.count"/>
            </sequence>
          </content>
        </dataSpec>

Here in the USA when using the NEMSIS scheme for encoding electronic medical records the field for a drug to which the patient is allergic may be expressed as an ICD10 code or by using the RxNorm name of the medication. Perfect case for <altnernate>. (And there are a dozen actual use cases of //dataSpec/content/alternate in p5.xml.) And certainly <textNode> is needed to express “you can put anything in here (as long as it is escaped per the rules of XML).” Would be useful for @alt if you were writing an ODD to describe XHTML, no? (And it is used twice in P5.) It is a bit harder (for me) to formulate an argument for <empty>. After all, maxOccurs="0" would have the same effect, wouldn’t it? (Why we allow maxOccurs=0 is another question.) My thought is that it is useful for those who use programs to write schemas to be able to make the <content> valid by putting in <empty>, whether that <content> is in an <elementSpec> or something else. I do not actually know this is the case, I just think so.

It is obvious (at least, to me) that <classRef> inside a <dataSpec> is about as bad an idea as <elementRef> (or <anyElement>, which I think of as a special case of <elementRef>).

That leaves <macroRef>. It is useful for expressing a constraint in RELAX NG. But, of course, you could just put the RELAX NG directly inside the <content> of the <dataSpec>, so why would you need <macroRef>? The only reason I can think of off the top of my head is that maybe you want to use the same snippet of RELAX NG in several places. Thus being able to refer to it eliminates duplication of code.

So it seems to me of the 9 possible children of <content> I would allow 6 (<alternate>, <dataRef>, <empty>, <macroRef>, <sequence>, and <textNode>) inside <dataSpec> and disallow the other 3 (<anyElement>, <classRef>, and <elementRef>).

sydb commented 1 year ago

P.S. Worth mentioning this is a perfect use case for a co-occurrence constraint or contextually-variant content model (see #1744).

ebeshero commented 1 year ago

Resolved with PR merge.