TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
281 stars 88 forks source link

Should `<anyElement>` be restricted as Stylesheets imply? #2484

Open sydb opened 1 year ago

sydb commented 1 year ago

The prose of 22.5.1.1 Defining Content Models: TEI says

An <anyElement> also asserts that an element may appear at a certain point in a content model, but rather than providing the name of a particular element type that may appear, any element regardless of its name may appear (and may have any attributes).

It is the only sentence about <anyElement> in the prose of the Guidelines. The <anyElement> element is a member of model.contentPart, and thus can appear as a child of <content>, <alternate>, or <sequence> without further restriction. Thus content models like

<content>
  <elementRef key="persName"/>
  <elementRef key="date"/>
  <anyElement require="http://xspf.org/ns/0/"/>
</content>

or even

<content>
  <elementRef key="persName"/>
  <elementRef key="date"/>
  <anyElement require="http://xspf.org/ns/0/" minOccurs="1" maxOccurs="12"/>
  <elementRef key="linkGrp"/>
  <anyElement require="http://www.w3.org/2009/10/emotionml" minOccurs="0" maxOccurs="unbounded"/>
  <anyElement except="https://docbook.org/ns/docbook"/>
</content>

would seem to be quite reasonable.

However, the schema generated for the above does not come close to the right thing:

  1. The @minOccurs and @maxOccurs are ignored, see Stylesheets#627.
  2. The RELAX NG content model for the 1st and 2nd <anyElement>s do not require elements be in the XSPF and emotionml namespaces, respectively. (See details below.)
  3. The Schematron generated to enforce the require=xspf (or tei:*) enforces it for all elements defined by this content model, not just the 1st <anyElement>s; there is no Schematron to enforce the require=emotionml.
  4. The @except is summarily ignored.

So, clearly there are problems here. But these are Stylesheets problems (and things ATOP needs to avoid), why am I posting here? Because it is pretty clear to me (although not 100% entirely clear) that whoever wrote the code for the Schematron (3) was presuming that there would only be 1 <anyElement> descendant of any <content> — probably either a child or a grandchild (a child of an <alternate> child). In fact, there are only 3 examples of <anyElement> in the Guidelines, and in all 3 cases there is 1 and only 1 <anyElement> descendant of <content> (either a child or a child of a child <alternate>). So it is not a crazy thought.

So this ticket is here for us to consider the question as to whether the implications of the Guidelines are correct, <anyElement> is allowed anywhere inside a <content> that an <elementRef> is allowed, or if the Stylesheets and examples are correct, there can only be 1 <anyElement> descendant of any given <content>.

ebeshero commented 1 year ago

So, apparently what we promise for <anyElement> in the Guidelines is vastly exaggerated, considering the limited and problematic processing we now provide. I hope we can improve this situation. Council will need to discuss.

ebeshero commented 8 months ago

Is this ticket now mostly addressed by Stylesheets PR #683?

sydb commented 4 months ago

Is this ticket now mostly addressed by Stylesheets PR #683?

No, only ¼ of this issue — item (1) of the OP, getting @minOccurs and @maxOccurs to work — was addressed by Stylesheets PR #683.