TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
268 stars 88 forks source link

Need Schematron to constrain attribute co-occurrence on `<elementRef>` #2543

Open martindholmes opened 2 months ago

martindholmes commented 2 months ago

Arising from ATOP work by @sydb, @HelenaSabel and myself:

<elementRef> does two distinct jobs: it can effectively import an elementSpec from another source using something like this:

<elementRef key="p" source="tei:1.2.1"/>

or it can reference an <elementSpec> somewhere in the ODD or imported from another source, to use it in a content model.

When <elementRef> is a child of <schemaSpec> or <specGrp>, it is doing the former job, and therefore it should only be allowed to have the @source attribute and not the @minOccurs and @maxOccurs attributes.

Conversely, when it's a child of a content model element such as <content> or <sequence>, it should be allowed to have @minOccurs and @maxOccurs, but not @source.

We believe these constraints should be expressed in Schematron.

sydb commented 2 months ago

So to be precise, we are thinking that (schemaSpec|specGrp)/elementRef should not be allowed to have @minOcccurs or @maxOccurs, and that content//elementRef should not be allowed to have @source.

I do not think a deprecation period is in order, as these constructions strike me as nonsensical. That said, I have not actually tested to see if the current Stylesheets produce anything useful when this occurs.

sydb commented 2 months ago

Thus something like

  <sch:pattern>
    <sch:rule context="tei:elementRef[ parent::schemaSpec | parent::specGrp ]">
      <sch:report test="@minOccurs | @maxOccurs">An element reference is not repeatable when part of a schema specification (and thus this &lt;elementRef> should not have @minOccurs or @maxOccurs). </sch:report>
    </sch:rule>
    <sch:rule context="tei:content//tei:elementRef">
      <sch:report test="@source">An element reference within a content model must refer to a locally defined element specification (and thus this &lt;elementRef> should not have @source).</sch:report>
    </sch:rule>
  </sch:pattern>
martindholmes commented 2 months ago

I would do those as assertions, not reports. I think these would be errors.

sydb commented 1 month ago

I do not get your meaning @martindholmes — the implication is that <sch:report>s are not errors (and that <sch:assert>s are errors). I do not think the Schematron spec supports that idea. The @role attribute determines whether an <sch:assert> or <sch:report> is an error (or warning or whatever). I think it is perfectly reasonable to add role="error" to the <sch:report>s above, as you are right, these are errors. It strikes me as less reasonable to express the desired test in the negative (e.g. either not( @minOccurs | @maxOccurs ) or not( @minOccurs or @maxOccurs ) or not( @minOccurs ) and not( @maxOccurs ), and not( @source )) when there is no need to add that extra layer of complexity.

martindholmes commented 1 month ago

Fair enough. I tend to use asserts for errors and reports to tell me about odd things that are possibly worth looking at but not worth failing the build for, but that's just me.

sydb commented 1 month ago

That said, I have not actually tested to see if the current Stylesheets produce anything useful when this occurs.

I have now tested. Putting @source on content/elementRef had no effect on the RELAX NG, nor did putting @minOccurs and @maxOccurs on schemaSpec/elementRef.

sabineseifert commented 1 month ago

European subgroup at VF2F April 27:

lb42 commented 1 month ago

Why is it considered erroneous to soecify source for an elementRef occuring within content? Seems perfectly reasonable to me.

sydb commented 1 month ago

Summary: While I agree it is reasonable (at least, not unreasonable), it would be quite difficult to do with very little benefit.

First thing that jumps to mind is we have no mechanism for documenting the difference. That is, let’s say you (the customizer) use P5 version 19.2.0 as your base TEI, but then, in the content of <cit> you use the version 17.0.0 version of <pc> for whatever reason. So now you have 17.0.0 <pc> inside <cit> and version 19.2.0 <pc> inside everything else.

Presuming that we could get the output RELAX NG schema to reflect this somehow (which the current Stylesheets will not do, so this could require a lot of work), which version of <pc> would be reflected in your customized documentation?

You might say that there should be such a mechanism to differentiate (like <pc> v. 17.0.0 as the heading to the section on one, <pc> v. 19.2.0 for the other). But there is no such capability now, and I do not think there is a lot of support for putting in the time & effort to do that when no one has requested it in the past decade.

lb42 commented 1 month ago

But the current system already allows me to mix elementRefs with different @sources within a single schemaSpec without any such mechanism so why make a fuss for this special case ?;

sydb commented 4 weeks ago

Well, if you are mixing them inside <content>, the system lets you mix them, but does not produce different output based on @source, so you are getting fooled by the system, the very thing we are hoping to prevent, here.

If you are mixing them as children of <schemaSpec> the system does produce different output based on @source, so while there is no special mechanism to flag “take care, this element is from a different version”, at least the output you get is not lying to you, i.e. it is correct with respect to your ODD.

lb42 commented 4 weeks ago

Not supporting a perfectly reasonable requirement on the grounds that its too hard to implement is not a good look, tho one i have seen before. But its clearly better to be honest about it than silently ignoring it!