TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
279 stars 88 forks source link

`<content>` should have 1 and only 1 child #2381

Closed sydb closed 1 year ago

sydb commented 1 year ago

I am raising this ticket here (at request of ATOP task force) for discussion of the issue.

The Guidelines do not actually say what a series of children of <content> means. We all know (and the Stylesheets behave as though) a series of child elements of <content> means the same thing as the same set of child elements inside a <sequence>, itself the only child of the <content>. That is,

  <content>
    <elementRef key="a"/>
    <elementRef key="b"/>
    <elementRef key="c"/>
    <alternate>
      <elementRef key="x"/>
      <elementRef key="y"/>
      <elementRef key="z"/>
    </alternate>
  </content>

means exactly the same as

  <content>
    <sequence>                  <!-- ← explict sequence 😌 -->
      <elementRef key="a"/>
      <elementRef key="b"/>
      <elementRef key="c"/>
      <alternate>
        <elementRef key="x"/>
        <elementRef key="y"/>
        <elementRef key="z"/>
      </alternate>
    </sequence>
  </content>

The former is a bit easier to write; the latter is explicit and gives a place for explicit quantification (@minOccurs & @maxOccurs). More importantly, the former utterly fails when generating DTDs, you have to use the latter.

The content model for <content> is currently

  <content>
    <alternate>
      <elementRef key="valList" minOccurs="1" maxOccurs="1"/>
      <classRef key="model.contentPart" minOccurs="1" maxOccurs="unbounded"/>
      <anyElement minOccurs="1" maxOccurs="unbounded" require="http://relaxng.org/ns/compatibility/annotations/1.0 http://relaxng.org/ns/structure/1.0"/>
    </alternate>
  </content>

We were thinking that perhaps it should be

  <content>
    <alternate>
      <elementRef key="valList" minOccurs="1" maxOccurs="1"/>
      <classRef key="model.contentPart" minOccurs="1" maxOccurs="1"/>
      <anyElement minOccurs="1" maxOccurs="unbounded" require="http://relaxng.org/ns/compatibility/annotations/1.0 http://relaxng.org/ns/structure/1.0"/>
    </alternate>
  </content>

instead. (The only difference is a max of "1" from model.contentPart, rather than "unbounded".) This would involve a bit of re-writing of section 22.5.1.1 “Defining Content Models: TEI”, but results in a cleaner system. Not remotely backwards-compatible, of course, so would require some serious deprecation time.

sydb commented 1 year ago

It occurs to me that the RELAX NG element <rng:div> does nothing more than group its children — it has no effect on validation. (Each <rng:div> element of a full RELAX NG schema is replaced by its children in the corresponding simple schema.) Thus we could quite reasonably take this a step further by requiring that there be one and only 1 child of an <content> — in the case when multiple RELAX NG elements are desired, they need be wrapped in an <rng:div>.

The ATOP task force likes this idea, because it makes the rules for <content> simple, straightforward, and symmetrical.

If we adopt this proposal, the new content model for <content> would be

  <content>
    <alternate>
      <elementRef key="valList" minOccurs="1" maxOccurs="1"/>
      <classRef key="model.contentPart" minOccurs="1" maxOccurs="1"/>
      <anyElement minOccurs="1" maxOccurs="1" require="http://relaxng.org/ns/compatibility/annotations/1.0 http://relaxng.org/ns/structure/1.0"/>
    </alternate>
  </content>

(The only difference between this and the current content model of <content> are that the max is always "1", whereas for both model.contentPart and anyRelaxNGElement it is currently "unbounded".)

sydb commented 1 year ago

The following XSLT will transform any current valid content models (i.e., <content> elements) that do not follow the new rule suggested herein (1 and only 1 child) to ones that do follow said new rule.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.tei-c.org/ns/1.0"
  xmlns:rng="http://relaxng.org/ns/structure/1.0"
  xmlns:tei="http://www.tei-c.org/ns/1.0"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0"
  exclude-result-prefixes="#all"
  version="3.0">

  <xsl:output method="xml" indent="yes"/>
  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="content[ *[2] ]" as="element(content)">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:element name="{if ( tei:* ) then 'sequence' else 'rng:div'}">
        <xsl:apply-templates select="node()"/>
      </xsl:element>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

(The input need be valid, because this XSLT does no error checking.)

sydb commented 1 year ago

Oh dear. Note that the discussion and sample XSLT code, above, did not take into consideration that multiple <empty> elements are permitted as the content of <content>. However, since this is also allowed as the content of <sequence>, I do not think it is a problem. That is, any content model that is currently specified as

<content>
  <empty/><empty/><empty/>
</content>

could just as well be specified with

<content>
  <sequence>
    <empty/><empty/><empty/>
  </sequence>
</content>

No harm, no foul. (Yes, that looks like an extraordinarily silly content model, and it is. Very unlikely any human would write that content model, but it may well be the result of automated generation of schemas.)

ebeshero commented 1 year ago

Council decides we should change the content model of <content> as defined here, and add warnings to spec page for <content> and in release notes. @martinascholger

lb42 commented 1 year ago

I think a sequence of elementRefs not grouped within a sequence or an alternate is an error.

ebeshero commented 1 year ago

At the Stylesheets meeting today, we decided that @sydb should draft a Schematron rule to issue a deprecation message, and then we'll plan to issue the changes in https://github.com/TEIC/TEI/commit/b5842db21928814bd20f7820fc2bc5cb17545b6f for the next later (4.7.0) release. We'll also update the prose in the branch.

bansp commented 1 year ago

It's nice to see ODD become tighter as the individual wrinkles get smoothed out. Thanks to everyone involved!