TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

Guidelines should clarify how fragmentary Schematron should be expanded #2444

Closed martindholmes closed 6 months ago

martindholmes commented 1 year ago

When a TEI <constraint> element contains three <sch:assert> elements (for example) with no other wrapping Schematron, there are multiple possible ways to interpret what is intended. The output Schematron could consist of

  1. a single <sch:pattern> element with a single <sch:rule> element containing all three <sch:assert>s, or
  2. a single <sch:pattern> containing three <sch:rule>s each containing an <sch:assert>; or
  3. three <sch:pattern> elements each containing a single <sch:rule> with a single <sch:assert>.

Each of these output options has consequences for how the Schematron will work.

@sydb has always assumed that the fact that all these <sch:assert>s are grouped in a single <constraint> element here implies that the intention is (1); in other words, that the TEI <constraint> element should give rise to a <sch:pattern> and a <sch:rule> wrapping its contents. I don't believe that's necessarily so, not least because a complete <sch:pattern> could be supplied in the same context; at the very least, the situation is ambiguous and the rules should be clarified. One approach to this would be to find out what the current Stylesheets do and codify that behaviour in the Guidelines. Alternatively, Council may decide that a different set of processing rules makes more sense, and we could then back-port that the the existing Stylesheets as well as implementing it in ATOP.

sydb commented 1 year ago

I completely agree with @martindholmes that the Guidelines should be explicit about what an ODD processor will do in this case. (Probably in section 22.5.2.) But I am quite comfortable keeping the current behavior. I will go into more detail below, but concisely, of the three possibilities, (1) is the current behavior, (2) is a terrible idea because it means only the 1st <sch:assert> will be in a rule that gets fired, and (3) is essentially the opposite of (1) — it is equally as expressive — but since users have already gotten used to (1) I see no reason to change.

(1) = current behavior If the input is

  <elementSpec module="core" ident="p" mode="change">
    <constraintSpec mode="add" ident="unwrapped" scheme="schematron">
      <constraint>
        <sch:report test="true()">Yes. It’s true. This man …</sch:report>
        <sch:assert test="false()">We live in a world where unfortunately the distinction between true
          and false appears to become increasingly blurred by manipulation of facts, by exploitation
          of uncritical minds, and by the pollution of the language.</sch:assert>
        <sch:report test="1 + 1 + 1 = 3">Got to be good looking ’cause he’s so hard to see</sch:report>
      </constraint>
    </constraintSpec>
  </elementSpec>

then our two current processors (the Stylesheets and the standalone extract-isosch.xsl) produce the same Schematron except different generated values for @id:

  <sch:pattern id="demo_2444-p-unwrapped-constraint-report-6">
    <sch:rule context="tei:p">
      <sch:report test="true()">Yes. It’s true. This man …</sch:report>
        <sch:assert test="false()">We live in a world where unfortunately the distinction between true
          and false appears to become increasingly blurred by manipulation of facts, by exploitation
          of uncritical minds, and by the pollution of the language.</sch:assert>
      <sch:report test="1 + 1 + 1 = 3">Got to be good looking ’cause he’s so hard to see</sch:report>
    </sch:rule>
  </sch:pattern>

and

  <sch:pattern id="schematron-constraint-demo_2444-p-unwrapped-11">
    <sch:rule context="tei:p">
      <sch:report test="true()">Yes. It’s true. This man …</sch:report>
      <sch:assert test="false()">We live in a world where unfortunately the distinction between true
        and false appears to become increasingly blurred by manipulation of facts, by exploitation
        of uncritical minds, and by the pollution of the language.</sch:assert>
      <sch:report test="1 + 1 + 1 = 3">Got to be good looking ’cause he’s so hard to see</sch:report>
    </sch:rule>
  </sch:pattern>

(They also had different ideas of which namespaces should be expressed with @xmlns and which should be expressed with a prefix, but I have elided those differences here.)

(2) = 1 pattern, 3 rules This method of processing would produce

  <sch:pattern id="schematron-constraint-demo_2444-p-unwrapped-whatever">
    <sch:rule context="tei:p">
      <sch:report test="true()">Yes. It’s true. This man …</sch:report>
    </sch:rule>
    <sch:rule context="tei:p">      
      <sch:assert test="false()">We live in a world where unfortunately the distinction between true
        and false appears to become increasingly blurred by manipulation of facts, by exploitation
        of uncritical minds, and by the pollution of the language.</sch:assert>
    </sch:rule>
    <sch:rule context="tei:p">      
      <sch:report test="1 + 1 + 1 = 3">Got to be good looking ’cause he’s so hard to see</sch:report>
    </sch:rule>
  </sch:pattern>

This is untenable, because most users would not realize that the 2nd and 3rd rules will never fire. (In Schematron, only the 1st rule (in document order) whose context is matched is fired.)

(3) = 3 patterns, 3 (separate) rules This method of processing would produce

  <sch:pattern id="schematron-constraint-demo_2444-p-unwrapped-whatever-1">
    <sch:rule context="tei:p">
      <sch:report test="true()">Yes. It’s true. This man …</sch:report>
    </sch:rule>
  </sch:pattern>
  <sch:pattern id="schematron-constraint-demo_2444-p-unwrapped-whatever-2">
    <sch:rule context="tei:p">      
      <sch:assert test="false()">We live in a world where unfortunately the distinction between true
        and false appears to become increasingly blurred by manipulation of facts, by exploitation
        of uncritical minds, and by the pollution of the language.</sch:assert>
    </sch:rule>
  </sch:pattern>
  <sch:pattern id="schematron-constraint-demo_2444-p-unwrapped-whatever-3">
    <sch:rule context="tei:p">      
      <sch:report test="1 + 1 + 1 = 3">Got to be good looking ’cause he’s so hard to see</sch:report>
    </sch:rule>
  </sch:pattern>

This, I think, would have been a perfectly reasonable thing to do back in the day. But it is not what the Stylesheets have done for over a decade, and I do not see any reason, let alone a compelling reason, to change that behavior.

It is worth re-stating that what is being discussed here is what happens when a user does not explicitly specify how patterns, rules, assertions, and reports should be grouped. Users are always allowed to express this explicitly, so we are only talking about what happens when users want to be concise.

Further note that with (1) a user can express either the “three assertions in one pattern” or “three patterns with one assertion each” outcome — by using one <constraintSpec> for the former, or three separate <constraintSpec>s for the latter — all without explicity using <sch:pattern> or <sch:rule>.

joeytakeda commented 1 year ago

Agreed with @sydb — option 1 (just wrap them all in a sch:pattern/sch:rule) makes good sense to me (I believe the same goes for if you have a <constraint> with 1+ sch:rules: all of the sch:rules would be grouped into a single sch:pattern, right?)

sydb commented 1 year ago

Yes.

ebeshero commented 1 year ago

Council F2F Paderborn: Discussing this, we are fine with the way this is currently processed, and we recommend documenting what the default processing does somewhere, perhaps as a <remark> on the spec for either <constraint>or <constraintSpec>.

martindholmes commented 1 year ago

We believe now that this may be moot because the decision to require @context means that the user/ODD-writer will have to supply <sch:rule>, and therefore will be responsible for the grouping.

sydb commented 6 months ago

This issue is now conceptually tied to #2510, and should be closed “won’t fix” when that one is implemented.

joeytakeda commented 6 months ago

PR #2513 (responding to #2510) now adds a warning that sch:rule[not(@context)] will be deprecated 2025-03-15 — so closing this per @sydb's comment above.