Schematron / schematron

Schematron "skeleton" - XSLT implementation
MIT License
93 stars 45 forks source link

Conditional pattern set #37

Open PStellmann opened 7 years ago

PStellmann commented 7 years ago

I would like to be able to specify a condition (as xpath to be applied to the input document) to check if a set of patterns should be executed.

I have two use-cases for this, both from a DITA context:

  1. Avoid detailed validation of a document thta is marked as "in revision". (see #18)
  2. When using the same schematron file for a whole set of DITA files of different topic types there are likely to be some rules that can only match on elements within specific topic types. So I might have a context pattern like *[contains(@class, ' custom-domain/myElement ')]. When processing a standard DITA topic that does not use the custom domain this pattern will never match - and I already know this from looking at the root element. One option would be to extend the pattern to /*[contains(@class, ' custom-domain/myTopic ')]//*[contains(@class, ' custom-domain/myElement ')]. But the resulting xsl:template will still be checked for every single node. So adding some condition to the pattern-element (or preferably a set of patterns) could skip the whole traversal for all files not fulfilling this condition.
rjelliffe commented 7 years ago

In Schematron, I decided againt putting in a special class of guard paths, because guards usually hide assertions. Similarly, I didn't want to support arbitrary composition of assertions (i.e. using and/or or case statements instead of rules) in order to encourage/enforce a flat structure of simple statements.

So, without claiming this is good enough for you, the most efficient way of doing guards currently is to use a top-level boolean variable for the guard condition, so you have

...

.... This still visits every node, but only has a simple pre-calculated boolean test for each node in the worst case. But it does make sense to me that there could be a good (command-level) optimization for the case you give: whether that is a guard provided for patterns (an extension of the new pattern/@document feature?) or for phases I don't know. In fact, I think all that is needed is to mark the pattern so that as soon as one rule fires (i.e. a dummy top-level rule with context "/[$my-topic]" no subsequent rules need to fire.) This would be a variant of the other optimization that if one assertion fails, no other assertions or rules need to be tested. Would that satisfy your requirement?
PStellmann commented 7 years ago

Thanks for your thoughts.

The disadvantage of adding the condition to the patterns is a poor worst-case-behavior: So when there is only a single rule in that pattern still the whole document will be processed.

However, I agree that the @followup on a rule could do the same and is even more powerfull since it is not limits to conditions on document level but can deactive the validation of any subtree.

Just as a comparision of the source-code variants to get a feeling:

Using a condition on a pattern:

<sch:pattern condition="contains(/*/@class, ' custom-domain/myTopic ')">
  <sch:rule context="*[contains(@class, ' custom-domain/myElement ')]">
    <sch:assert test="...">
      <!-- message -->
    </sch:assert>
  </sch:rule>
</sch:pattern>

Using @followup on a rule:

<sch:let name="is-not-myTopic" value="not(contains(/*/@class, ' custom-domain/myTopic '))"/>
<sch:pattern>
  <sch:rule context="*[$is-not-myTopic]" followup="skip-content">
     <!-- no tests, just abort the validation -->
  </sch:rule>
  <sch:rule context="*[contains(@class, ' custom-domain/myElement ')]">
    <sch:assert test="...">
      <!-- message -->
    </sch:assert>
  </sch:rule>
</sch:pattern>

I'm gonna start the implementation of my validation framwork within the next 2-3 weeks using @followup and will post my experience here...