TEIC / Stylesheets

TEI XSL Stylesheets
238 stars 126 forks source link

Processor drops patterns from imported RELAX NG with the same name as a generated pattern #623

Open dmj opened 1 year ago

dmj commented 1 year ago

Take for example the tei_odds.odd customization: https://tei-c.org/Vault/P5/current/xml/tei/custom/odd/tei_odds.odd.

It imports the RELAX NG grammar via moduleRef/@url:

<moduleRef url="https://www.tei-c.org/release/xml/tei/Exemplars/relaxng.rng"/>

The grammar defines a pattern with the name "param" (Line 205).

  <define name="param">
    <element name="param">
      <attribute name="name">
        <data type="NCName"/>
      </attribute>
      <data type="string"/>
    </element>
  </define>

If you transpile the tei_odds.odd to RELAX NG you get a grammar that also defines a pattern with the name "param", but for the TEI param element:

<define name="param">
      <element name="param">
         <a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">provides a parameter for a model behaviour by supplying its name and an XPath expression identifying the location of its content. [22.5.4.5. Behaviours and their parameters]</a:documentation>
         <empty/>
         <ref name="att.global.attributes"/>
         <attribute name="name">
         ...

The pattern from the RELAX NG with the name "param" is gone while all other patterns from the grammar are still present.

joeytakeda commented 1 year ago

Thanks @dmj — just tested this and can confirm this is the case. And apologies if what I'm outlining here is obvious, but just want to retrace my steps in thinking about this:

RelaxNG can in fact have duplicate names: https://relaxng.org/tutorial-20011203.html#IDA04YR — but in those cases, at least one of the <define> elements must specify a @combine to signal how these patterns are meant to be treated together (i.e. either interleaved or a choice). And we get an example of it in 22.8.2:

<moduleRef url="svg11.rng">
 <content>
  <rng:define name="TEI_model.graphicLike"
   combine="choice">
   <rng:ref name="svg"/>
  </rng:define>
 </content>
</moduleRef>

This is almost precisely the same way that <rng:include> works per the RNG specification except that with <rng:include>, any children element are understood to replace a definition, not "copied along with the content of the resource indicated by the url attribute into the target RELAX NG schema" (per the remarks for <moduleRef>). In other words, in a RelaxNG schema, this is valid and means "replace pattern with this definition"

   <rng:include href="https://www.tei-c.org/release/xml/tei/Exemplars/relaxng.rng">
            <rng:define name="pattern">
              <rng:ref name="TEI"/>
            </rng:define>
        </moduleRef>

The TEI analogy of this produces an invalid RelaxNG (but strangely—it should be invalid because of 2 definitions without @combine, but it is invalid because of 0 definitions of pattern):

        <moduleRef url="https://www.tei-c.org/release/xml/tei/Exemplars/relaxng.rng">
          <content>
            <rng:define name="pattern">
              <rng:ref name="TEI"/>
            </rng:define>
          </content>
        </moduleRef>

A few potential approaches:

  1. Do not delete any duplicate definitions in the output RNG, since that duplication isn't necessarily an error—this would produce broken RNGs (as is the case with tei_odds), but I think that's probably better than erroneously zapping imported things
  2. Detect when there's duplication from an imported RNG, raise an error, and tell the user that they should add a @prefix to either the schemaSpec or the moduleRef
  3. Automatically add prefixes to imported modules (which sounds like a bad idea to me, since it is conceivable that an imported module would have elements referred to be a separate module, no?)
  4. Rework the semantics of moduleRef such that it is equivalent to the <rng:include/>, but then, I guess, handle the processing of moduleRef children differently so that they are not included in the <rng:include>, but are sibling to it