Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
7 stars 0 forks source link

Enhancement for sch:phase - @when #71

Open rjelliffe opened 6 months ago

rjelliffe commented 6 months ago

(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as one of the most important IMHO.)

Motivating Use-Case

The user has a stream of XML documents they want to validate. The documents can be from several different schemas, perhaps different versions of schemas, perhaps entirely new namespaces. We want to cope with these inside Schematron rather than relying on some external mechanism to look at the document and select the appropriate Schematron schema or phase. We do not want to have to add lots of conditions to every sch:rule/@context.

An example might be a Schematron schema that can validate every kind of XSLT document, so that phases test the /xsl:stylesheet/@version attribute and only run if matching: e.g. a phase for 1.0, 2.0, 3.0, 3.1, and so-called 4.0.

Suggestion We introduce an attribute sch:phase/@test which takes an XPath expression evaluated in the current global scope of variables (i.e. on the initial document) and on the selected document. When attempting to run a phase (because it is selected or because the #ALL default is operating) the test is first evaluated as boolean; if the test succeeds then the phase is selected.

I considered names like @when, @for, @because, @if etc. however I thought re-using @test was better, for not multiplying names. However, I am 100% not wedded to the name, and another might be preferred.

(I considered allowing it on sch:pattern as well, ih particular to interact with sch:pattern/@document, but I thought that the use-case was not as clear, and it seemed to create messiness and incomprehensibility rather than reduce it. I think it would be better to support e.g. sch:pattern/@document so that phases can apply to sub-documents. But that needs more work and thought and is not part of this proposal.)

Example

<sch:phase id="vanilla-html" test="/html"> ...
<sch:phase id="xhtml" test="/xhtml:html"> ...
<sch:phase id="not-html" test="not(/html or /xhtml:html)"><sch:active pattern="report-not-html-as-fatal-error"/></sch:phase>  

<sch:pattern name="report-not-html-as-fatal-error">
   <sch:rule context="/*"><sch:report severity="FATAL">The document must be HTML or XHTML</sch:report></sch:rule></sch:pattern> 

In this example, the @test allows the incoming document to be HTML-in-XML or XHTML, and it generates a warning otherwise without attempting any other validation.

I considered having some default message that would be activated if no sch:phase/@test tests true, but I though the above was the minimum to declare victory and the simplest to implement and understand.

Implementation I think sch:pattern/@test is quite easy to implement, e.g. to generate on the lines of

<xsl:template match="/"  >
   <xsl:if test="contains($phase, 'vanilla-html') or contains($phase, '#ALL) 
        or (not($phase) and (contains( $defaultPhase, 'vanilla-html') or contains($defaultPhase, '#ALL'))))"> 
         <xsl:if test="/html">
            <xsl:call-template name="vanilla-html"  mode="pattern-mode" />
        </xsl:if>
  </xsl:if>
 ...

`

rkottmann commented 3 months ago

I also have this kind of use case. Hence, I support this proposal.

I would like to propose NOT to name the attribute test, because it is semantically quiet different to (report|assert)/@test.

However, I have no positive suggestion.

Xforms uses e.g. relevance for similar use-case.

The ant build tool uses if and unless.

when is also a good name.

rjelliffe commented 3 months ago

I agree about the name. @when seems better Rick

On Fri, Jun 7, 2024 at 12:25 AM renzo @.***> wrote:

I also have this kind of use case. Hence, I support this proposal.

I would like to propose NOT to name the attribute test, because it is semantically quiet different to @.***

However, I have no positive suggestion.

Xforms uses e.g. relevance for similar use-case.

The ant build tool uses if and unless.

when is also a good name.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/71#issuecomment-2152677118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKKRMOQSXUMWAHHJ4TLZGBWN5AVCNFSM6AAAAABI43JAF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSGY3TOMJRHA . You are receiving this because you authored the thread.Message ID: @.*** com>

AndrewSales commented 3 months ago

or because the #ALL default is operating

I want to point out (mainly for the benefit of implementers) that the standard defines #ALL as denoting "that all patterns are active" [my italics]. Note the same wording also applies if #DEFAULT is specified but no @defaultPhase is given in the schema.

This is subtly different from all phases being active, and this proposal would need to take account of the difference: the text of the current standard implies that the presence of phases is effectively immaterial if #ALL is specified. This proposal would require implementations to retain what phase patterns belong to, because of the need to evaluate phase/@when.

If this proposal is included in the standard, I would suggest also clarifying that #ALL (and #DEFAULT where no @defaultPhase is present in the schema) mean all phases are active, and all patterns which do not belong to a phase are active. I feel this would clarify the processing model.

rjelliffe commented 3 months ago

Andrew is right about the wording problem but I dont think we need to change #ALL .

I suggest that a new defaultPhase called "#ALL-PHASES" be defined, which means all phases are tried in implementation-dependant order (each with their @when test). If #ALL then no phases are active, so no sch:phase/@when is tested. If a phase is specified in @defaultPhase, then any @when is tested.

AndrewSales commented 3 months ago

I dont think we need to change #ALL

I think that horse has already exited the stable in this case, unfortunately.

The current text has: "Two strings, #ALL and #DEFAULT, have special meanings when specifying active phases." [my italics] Although it then goes on to mention patterns explicitly and not phases, the net result is that #ALL and the proposed #ALL-PHASES might be similar enough to cause confusion.

Is it an implicit part of the use case here to exclude from processing patterns which don't belong to a phase? I can see there might be cases where you would want some non-phase patterns applied regardless. But if the idea is to exclude them, perhaps #PHASES-ONLY would work as an active phase specifier, whose semantic would be to only process patterns belonging to a phase (a useful side-effect feature?) and pave the way for phase/@when.

If as a user you do want to benefit from phase/@when and have non-phase patterns processed too, then you can just use #ALL in the re-worded definition I gave earlier.

rjelliffe commented 3 months ago

ALL is the name of a phase: the built-in default one which invokes all

patterns, regardless of any phase declarations.

ALL-PHASES would be the name of a phase which invokes all patterns that

are specified active in any phase (subject to any @test.) So not standalone patterns.

On Sun, 23 June 2024, 9:53 pm Andrew Sales, @.***> wrote:

I dont think we need to change #ALL

I think that horse has already exited the stable in this case, unfortunately.

The current text has: "Two strings, #ALL and #DEFAULT, have special meanings when specifying active phases." [my italics] Although it then goes on to mention patterns explicitly and not phases, the net result is that #ALL and the proposed #ALL-PHASES might be similar enough to cause confusion.

Is it an implicit part of the use case here to exclude from processing patterns which don't belong to a phase? I can see there might be cases where you would want some non-phase patterns applied regardless. But if the idea is to exclude them, perhaps #PHASES-ONLY would work as an active phase specifier, whose semantic would be to only process patterns belonging to a phase (a useful side-effect feature?) and pave the way for @.***

If as a user you do want to benefit from @.** and* have non-phase patterns processed too, then you can just use #ALL in the re-worded definition I gave earlier.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/71#issuecomment-2184958634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKPKIVQWMQXFJNFRKTDZI2ZNHAVCNFSM6AAAAABI43JAF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUHE2TQNRTGQ . You are receiving this because you authored the thread.Message ID: @.*** com>

rjelliffe commented 3 months ago

If as a user you do want to benefit from @.** and* have non-phase patterns processed too, then you can just use #ALL in the re-worded definition I gave earlier.

Redefining phases so that if you select a phase then it will also activate patterns that are not in any phase is a breaking change. There is no justification for a breaking change that I can see.

If you want a pattern in a phase, put it in the phase.

On Sun, 23 June 2024, 9:53 pm Andrew Sales, @.***> wrote:

I dont think we need to change #ALL

I think that horse has already exited the stable in this case, unfortunately.

The current text has: "Two strings, #ALL and #DEFAULT, have special meanings when specifying active phases." [my italics] Although it then goes on to mention patterns explicitly and not phases, the net result is that #ALL and the proposed #ALL-PHASES might be similar enough to cause confusion.

Is it an implicit part of the use case here to exclude from processing patterns which don't belong to a phase? I can see there might be cases where you would want some non-phase patterns applied regardless. But if the idea is to exclude them, perhaps #PHASES-ONLY would work as an active phase specifier, whose semantic would be to only process patterns belonging to a phase (a useful side-effect feature?) and pave the way for @.***

If as a user you do want to benefit from @.** and* have non-phase patterns processed too, then you can just use #ALL in the re-worded definition I gave earlier.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/71#issuecomment-2184958634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKPKIVQWMQXFJNFRKTDZI2ZNHAVCNFSM6AAAAABI43JAF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUHE2TQNRTGQ . You are receiving this because you authored the thread.Message ID: @.*** com>

AndrewSales commented 3 months ago

ALL is the name of a phase: the built-in default one which invokes all

patterns, regardless of any phase declarations.

Redefining phases so that if you select a phase then it will also activate patterns that are not in any phase is a breaking change.

These two statements are contradictory, because the latter is what #ALL already does. It's also not what I was suggesting.

Is it an implicit part of the use case here to exclude from processing patterns which don't belong to a phase?

To be clear, I was asking the question above (as usual) to try to get to the bottom of the requirement, so that I and the ISO Working Group can understand what is proposed and see what we need to do to capture it in the text of the international standard and what implications it might have for that document as a whole. Also, to be very clear: there is no requirement for either me or any other member of the Working Group to engage with the proposals registered here; we do so voluntarily and at our discretion.

ALL is the name of a phase

You may believe it is, but it is not what the standard defines it as, and that is central to the current issue. I think it would be clearer then if #ALL and #DEFAULT were explicitly defined as implicit phases containing all patterns in the schema.

rjelliffe commented 3 months ago

I don't get it.

You are saying that if we have

<sch:schema defaultPhase="hello">
   <sch:phase name="hello">
      <sch:active pattern="p1"/>
   </sch:phase>

   <sch:pattern name="p1" ...</sch:pattern>
   <sch:pattern name="p2"...</sch:pattern>
</sch:schema>

then in this case we get both "p1" and "p2 active? That is not correct. Only the patterns that are nominated by a named phase are active in that phase.

In this case because the @defaultPhase is "hello" the only pattern active is "p1".

On Mon, Jun 24, 2024 at 1:22 AM Andrew Sales @.***> wrote:

ALL is the name of a phase: the built-in default one which invokes all

patterns, regardless of any phase declarations.

Redefining phases so that if you select a phase then it will also activate patterns that are not in any phase is a breaking change.

These two statements are contradictory, because the latter is what #ALL already does. It's also not what I was suggesting.

Is it an implicit part of the use case here to exclude from processing patterns which don't belong to a phase?

To be clear, I was asking the question above (as usual) to try to get to the bottom of the requirement, so that I and the ISO Working Group can understand what is proposed and see what we need to do to capture it in the text of the international standard and what implications it might have for that document as a whole. Also, to be very clear: there is no requirement for either me or any other member of the Working Group to engage with the proposals registered here; we do so voluntarily and at our discretion.

ALL is the name of a phase

You may believe it is, but it is not what the standard defines it as, and that is central to the current issue. I think it would be clearer then if #ALL and #DEFAULT were explicitly defined as implicit phases containing all patterns in the schema.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/71#issuecomment-2185040833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKMHY4FLRE2H4DYN5W3ZI3R4FAVCNFSM6AAAAABI43JAF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVGA2DAOBTGM . You are receiving this because you authored the thread.Message ID: @.*** com>

AndrewSales commented 3 months ago

No, as I said above, there are two such cases, #ALL and:

DEFAULT where no @defaultPhase is present in the schema

I indicated the need for clarity above, and I will now take this forward with the Working Group instead, if we have time to alter the standard appropriately.