Text Value Templates in messages

Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification

9 stars 0 forks source link

Text Value Templates in messages #49

Closed wendellpiez closed 8 months ago

wendellpiez commented 2 years ago

It would be nice if XSLT 3.0-style text value templates were supported within Schematron message text.

<assert test="count(element) le 3">You have { count(element) }
  { if (count(element) eq 1) then 'element' else 'elements'} (only 3 are permitted)</assert>

This issue overlaps somewhat with #37.

See https://www.w3.org/TR/xslt-30/#text-value-templates

tgraham-antenna commented 2 years ago

As a feature of an XSLT 3.0 (or related) binding or as something for all Schematron?

rjelliffe commented 2 years ago

Turning {} into a delimiter would break existing schemas. Adding an extra attribute to e.g. the sch:schema event to force a fail is not reliable. So I do not think it can be made a general facility.

For xslt3, if it is not in the existing QLB for xslt3, then it would need another QLB made, e.g. xslt3-2022, if that QLB has uptake (unless the various implentors agree to change in step.) And the definition of QLBs in the standard would have to be changed to allow the interpretation of text nodes to be an issue that QLBs are allowed to affect. That is doable.

I note, this is an issue of sugar, not capability.

But I do not think it is workable, because it adds to the work required to pretty-print a schema. sch:name and sch:value-of already complicate things, but at least the pretty-printer can put in something. (In fact, both should have an @alt attribute with the text to be displayed when pretty-printing the schema out.)

I note that parsing an XPath statement requires a very large parser, not just a regex. I made a PEG grammar and a REx grammar recently, and they had hundreds of thousands of characters of code. So it is not trivial for a pretty printer or text processing software, such as an editor, to know where the embedded expression ends. Or for a non-XSLT3 implementation to figure it out and wrap it in sch:value-of.

I think it would be reasonable for the WG to require an open source proof parser (e.g. in xslt) that read the the node and produced the equivalent sequence of text and sch:value-of element nodes, allowing developers (especially of pretty printers) to implement it without it being a stumbling block.

(I think it goes without saying that it is essential that the Schematron spec does not disadvantage situations where people cannot use Saxon. Any assumption that xslt3 is the base case for Schematron would do this. Indeed, even in Saxon users, pre-9.7 Saxon is still common.)

Schematron's design so far has been that sch:*/text() nodes are only for humans. I am concerned by anything that tries to downgrade the text into a kind of programming language.

Regards Rick

On Sat, 20 Aug 2022, 6:30 am Tony Graham, @.***> wrote:

As a feature of an XSLT 3.0 (or related) binding or as something for all Schematron?

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/49#issuecomment-1221069676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKISOA3HEP7HJ7L4VCDVZ7VFPANCNFSM57BPRKBA . You are receiving this because you are subscribed to this thread.Message ID: <Schematron/schematron-enhancement-proposals/issues/49/1221069676@ github.com>

wendellpiez commented 2 years ago

Hm: I admit my perspective on Schematron is pretty heavily influenced by my usage pattern, namely essentially as an XSLT superset, with XSLT functions and keys to support my Schematron, and hence relying heavily on XSLT functionality (and mindset) in general.

@rjelliffe what would you think of this not as a standard feature but an option? (As Tony says, only for the XSLT3 QLB.) For example when an @expand-text="true" flag is set, i.e. not for all text nodes but only those so identified?

AndrewSales commented 2 years ago

My four penn'orth: I think if schema authors elected to use this idiom to write their user-facing messages and implementers to support it under an XSLT 3.0 processor, then that is a matter for them. As a schema author myself, I can see the convenience and would welcome it.

I can also see @rjelliffe 's point about human-readable text inhering in sch:*/text(), but the ease of pretty-printing aspect isn't something enforced by the standard, as far as I'm aware, and it would be pretty (!) hard to enforce there, no?

In any case, I would say at most it should be mentioned in the Annex describing the query language binding for XSLT 3.0, with a note that this approach is allowed but its support implementation-defined.

rjelliffe commented 2 years ago

@rjelliffe what would you think...

What about a more general approach: a convention that when a processing instruction is found in a schema AND the PI's target matches the @queryBinding, then its contents are text interpreted against some production of the underlying technology (which could be specified in the QLB?)

For example:

<schema queryBinding="xslt3" ...>
...
   <assert test="count(element) le 3">
          <?xslt3 You have { count(element) }
                      { if (count(element) eq 1) then 'element' else 'elements'} (only 3 are permitted)
          ?l>
    </assert>

XML provided PIs specifically to allow marking up this kind of issue: text where you need to do some custom dynamic processing at that point.

wendellpiez commented 2 years ago

Nice. Very elegant. Only small reservation regards names and identifier strings for query bindings vs these operations e.g. wouldn't we like <?xslt3:tvt My tvt on my { name() } ?> to allow the purported xslt3 query processor to provide different kinds of magic?

Noting also: in a hybrid Schematron with XSLT enhancements there are other ways to sneak in the TVT syntax such as (YMMV)

<sch:assert test="count(x) le $max">
  <xsl:iterate select="." expand-text="true">we allow { $max } and you give us { count(x) }</xsl:iterate>
</sch:assert>

Or is this considered bad form?

rjelliffe commented 2 years ago

Here is a slightly simpler version of my suggestion.

1) ISO Schematron spec changed so that QLBs (Query Language Bindings) can define processing instructions, if needed, in and under assert, report, diagnostic, property.

and either 2 or 3:

2) (preferred) A processing instruction "sch" is defined for the xpath/xslt 1,2,3 QLBs which provides a shortcut for value-of. The reason this is feasible is that no extra parsing is involved.

 <assert test="count(element) le 3">
          You have <?sch count(element) ?>
                      <?sch if (count(element) eq 1) then 'element' else 'elements'?> (only 3 are permitted)
          ?>
    </assert>

which is equivalent to

 <assert test="count(element) le 3">
         You have <value-of select="count(element)"/>
                      <value-of select=" if (count(element) eq 1) then 'element' else 'elements'"/>(only 3 are permitted)
          ?>
    </assert>

You could just use bare "<?" and "?>", actually, if terseness was important (space important). I think XML does not mandate the PI target (notation name).

3) The QLB for xslt 3 only define a processing instruction "sch:xpath"

This would allow

 <assert test="count(element) le 3">
          <?sch:xpath  You have { count(element) }
                      { if (count(element) eq 1) then 'element' else 'elements'} (only 3 are permitted)
          ?>
    </assert>

which is equivalent to

 <assert test="count(element) le 3">
          <xsl:value-of select="concat(
                     "You have", count(element), "
                     ",  if (count(element) eq 1) then 'element' else 'elements',  "(only 3 are permitted)
          "/>
    </assert>

Why XSLT3 only for 3)? The full XPath rules are very complicated, and without them we cannot pair a starting { with its ending } reliably. In the case of XPath3, it seems it can be implemented simply by copying? (Is that right? )

Now it would be possible to create a much simpler version of the XPath syntax, which recognizes has symmetrical characters and literal delimiters. (This would be great to stick into a Schematron compiler too...) That would be more tractable.

tgraham-antenna commented 2 years ago

Writing the content of PIs gets zero support from (most, if not all) XML editors. Requiring complex expressions with no editor support could affect the enthusiasm level of a large segment of the target users.

If it becomes part of the standard, then you might assume that Oxygen or similar might change to support it, but right now, that's can only be an assumption.

rjelliffe commented 2 years ago

@Tony I don't understand what you mean by "writing the content of PIs" here, sorry. Pis are part of DOM, SAX, XSLT.

The amount of support that existing editors would give for parsing the XPaths in the following two cases is the same: none.

<sch:assert ... expand-in-brackets="true">This is a { name() }.
</sch:assert>
<sch:assert ...>This is a <?xpath name() ?>.</sch:assert>

To support pretty printing, the second requires something like (untested)

<xsl:template match="processing-instruction()[self::xpath]>
   <I><xsl:value-of select="."/></I>
<xsl:template>

which is not very onerous, while the first requires something more elaborate

<xsl:template ***@***.***='true']/text()">
    <xsl:value-of
select="me:some-function-to-parse-for-brackets-and-markup(.)/>
</xsl:template>

which require actual coding (rather than templating) skills.

@Wendell What you are asking for is simple macro expansion for assertion texts, isn't it?

 { XXX }  expands to  <xsl:value-of select="XXX"/>

I can see benefit in having this available. What I don't see is any reason why it should be part of the ISO Schematron standard (unless and until it becomes popular.)

This can be done by a layer on top of the XML, e.g. in a pre-processor, as a user-option. If some form of standardization is needed, it could be a TR, or better through OASIS or just a GitHub group.

You could generalize is slightly to allow declarations at the start of your stylesheet:

<sch:schema ...>
    <macro:def select="sch:assert/text() | sch:report/text()"
           find="{a}">
          <sch:value-of select="$a"/>
      </macro:def>
     ...

For what it is worth, this is the kind of thing that I think Invisible XML should be used to do: parse the string and imply elements.

On Tue, Oct 11, 2022 at 10:38 PM Tony Graham @.***> wrote:

Writing the content of PIs gets zero support from (most, if not all) XML editors. Requiring complex expressions with no editor support could affect the enthusiasm level of a large segment of the target users.

If it becomes part of the standard, then you might assume that Oxygen or similar might change to support it, but right now, that's can only be an assumption.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/49#issuecomment-1274545676, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKJKQ2IFFWCTZIP7MQ3WCVGS7ANCNFSM57BPRKBA . You are receiving this because you were mentioned.Message ID: @.*** com>

dmj commented 8 months ago

My four penn'orth: I think if schema authors elected to use this idiom to write their user-facing messages and implementers to support it under an XSLT 3.0 processor, then that is a matter for them. As a schema author myself, I can see the convenience and would welcome it.

And so it shall be. The next version of SchXslt2 has a transpiler parameter schxslt:expand-text that globally enables text value templates in the validation stylesheet.

AndrewSales commented 8 months ago

I would say at most it should be mentioned in the Annex describing the query language binding for XSLT 3.0, with a note that this approach is allowed but its support implementation-defined.

The XSLT 3.0 query language binding Annex has been updated to include:

"Text value templates may be used to dynamically construct user-defined messages. The extent to which this method is supported is implementation-defined."

rjelliffe commented 8 months ago

I think it needs to be there or not. Otherwise what is the use of the QLB? I think the bottom line is that a schema implementation should know from the sch:schema element's attributes whether the schema may be (if valid etc) acceptable or not by an engine. If you go outside XML to other embedded markup, then this is the case should be information available to the engine.

I suggest adding to sch:schema attribute

    requires  NMTOKEN*   #IMPLIED

with an explanation like "The @requires tokens are vendor-recognized tokens that say what non-standard extensions are needed in order to run the schema and generate the SVRL etc expected. It is a fatal error if there is a token present that the implementation does not support."

E.g.

<sch:schema  requires="inline-expand-text"

Rick

On Mon, Mar 18, 2024 at 12:21 AM Andrew Sales @.***> wrote:

I would say at most it should be mentioned in the Annex describing the query language binding for XSLT 3.0, with a note that this approach is allowed but its support implementation-defined.

The XSLT 3.0 query language binding Annex has been updated to include:

"Text value templates may be used to dynamically construct user-defined messages. The extent to which this method is supported is implementation-defined."

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/49#issuecomment-2002467257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKNRNG7GNC6GVJP6WZTYYWKE5AVCNFSM57BPRKBKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGI2DMNZSGU3Q . You are receiving this because you were mentioned.Message ID: @.*** com>

rjelliffe commented 8 months ago

Another candidate for @.*** could be the various in-place editing extensions for Schematron, such as SQF.

Rick

On Tue, Mar 19, 2024 at 5:20 PM Rick Jelliffe @.***> wrote:

I think it needs to be there or not. Otherwise what is the use of the QLB? I think the bottom line is that a schema implementation should know from the sch:schema element's attributes whether the schema may be (if valid etc) acceptable or not by an engine. If you go outside XML to other embedded markup, then this is the case should be information available to the engine.

I suggest adding to sch:schema attribute
    requires  NMTOKEN*   #IMPLIED
with an explanation like "The @requires tokens are vendor-recognized tokens that say what non-standard extensions are needed in order to run the schema and generate the SVRL etc expected. It is a fatal error if there is a token present that the implementation does not support."

E.g.
<sch:schema  requires="inline-expand-text"
Rick

On Mon, Mar 18, 2024 at 12:21 AM Andrew Sales @.***> wrote:

I would say at most it should be mentioned in the Annex describing the query language binding for XSLT 3.0, with a note that this approach is allowed but its support implementation-defined.

The XSLT 3.0 query language binding Annex has been updated to include:

"Text value templates may be used to dynamically construct user-defined messages. The extent to which this method is supported is implementation-defined."

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/49#issuecomment-2002467257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKNRNG7GNC6GVJP6WZTYYWKE5AVCNFSM57BPRKBKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBQGI2DMNZSGU3Q . You are receiving this because you were mentioned.Message ID: @.*** com>