Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
9 stars 0 forks source link

Allow default content inside sch:value-of and sch:name #50

Open rjelliffe opened 2 years ago

rjelliffe commented 2 years ago

(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as the single most important IMHO.)

An important scenario for Schematron is that it should support the SDLC for schemas. For example:

Unfortunately, where sch:value-of and sch:name are used, the assertions have a gap, creating the likelihood of nonsense sentences: exactly the opposite of what Schematron promises.

Furthermore, the XPaths in sch:value-of/@select and sch:name/@select have no documentation, presenting an added burden for maintainers who have to figure out what the intent of the Xpath was. Furthermore, if there is a mistake in the @select XPath, such as an unhandled case, it make fail to provide a text value, and so produce the gapped garbage message: this is not prudent for a validator, which needs to assume that the document is a mess and provide fallback behaviour so that the developer does not need to complicate their @select Xpaths to cover a default fallback.

Proposal

Sch:value-of and sch:name should allow rich-text (text(), span, emph, etc). This text would have the pretty-print/fallback phrase.

<sch:rule context="a | b | c">
  <sch:assert test="x"><sch:name>a, b and c</sch:name> elements should have one or more x elements in them.<sch:assert>
  ...

I think this is the kind of thing that an existing implementation could provide immediately as a value-add to the standard, because it can be stripped out with a simple XLST if ISO conformance is needed.

It removes a long-term wart, is limited and reason-about-able, does not change any other element, and would be trivial to implement.

AndrewSales commented 2 years ago

The key point for me from reading this proposal is the meaning and intention of "pretty printing". If this is, as in your example, for the benefit and consumption of non-technical stakeholders, then I think it belongs in the implementation realm and not as part of the standard.

The reason I say this is it represents a move away from a value supplied by the processor to one instead hard-coded by the schema author. It's effectively an aspect of the schema's documentation.

If there is a mistake in the value-of/@select or name/@path - it is path for the name element, I call that out as I've never understood the reason for the difference and have stumbled over it many times - then that lies squarely with the schema author; the same could equally be said of rule/@context or @test.

Likewise the documentation issue for these and other attributes could be addressed if https://github.com/Schematron/schematron-enhancement-proposals/issues/41 were adopted.

I'm not sure what the effect would be on downstream processors if the SVRL they receive suddenly contains a value for name that isn't a QName. (Admittedly, a fault in the path value could result in the empty string in any case, and I realise name isn't currently passed through to SVRL as name -- but perhaps some processors do/will do this?)

rjelliffe commented 2 years ago

I'm not sure what the effect would be on downstream processors if the SVRL they receive suddenly contains a value for name that isn't a QName.

According to my SVRL Cheat Sheet (https://schematron.com/document/3464.html) both sch:name and and sch:value produce text that is merged: the SVRL has no indication what caused the text. I think there is no effect on downstream processors.

And I think @AndrewSales is perhaps conflating the standard with the technology: the standard may be for technical stakeholders, but the technology it describes is for all stakeholders. So a change to the schema for this would be good; but implementers should do it now. Pretty printing (ie producing readable text from the schema) has been a consideration from the start, otherwise why are there title and p elements?

(In fact, the CMS system I am working on at the moment has exactly this problem.)

Edited: replaced "from the standard" with "from the schema"

AndrewSales commented 2 years ago

According to my SVRL Cheat Sheet (https://schematron.com/document/3464.html) both sch:name and and sch:value produce text that is merged: the SVRL has no indication what caused the text. I think there is no effect on downstream processors.

Please re-read what I wrote above: "I realise name isn't currently passed through to SVRL as name -- but perhaps some processors do/will do this?" This point and the point about what is produced for name still hold. If you have a downstream process that relies on expecting a QName there, it can't expect that any longer after this change. FYI I use the standard as the source of information about SVRL, a copy of the SVRL one is here: https://github.com/Schematron/schema/blob/main/svrl.rnc.

And I think @AndrewSales is perhaps conflating the standard with the technology

I'm absolutely not doing that, and I'm really quite surprised you say this @rjelliffe, given that in this issue and others on this list, I've made the distinction between the two very clear. Please remember that I've contributed to the writing of the text of this standard, as well writing an implementation of it: I know very well that the two are not the same thing.

So a change to the schema for this would be good; but implementers should do it now.

And that was exactly my point: this suggests to me it's an implementation thing.

Pretty printing (ie producing readable text from the standard)

Now that you define what you mean by the term "pretty printing", I'm afraid I'm confused: do you really mean from the standard, or from a conformant schema?

otherwise why are there title and p elements

But they're only available in a limited set of contexts. See also my comment above again re #41.

rjelliffe commented 2 years ago

@AndrewSales I did not mean to be insulting, so sorry if it came over like that.

"I realise name isn't currently passed through to SVRL as name -- but perhaps some processors do/will do this?"

You have lost me, sorry. Would it help to be more concrete?

Current assertions:

<sch:assert test="x | y" >Should have <sch:name/> element 
                  (near "<sch:value-of select="title"/>")</sch:assert>

Current SVRL example outcomes (implementer decisions):

<svrl-failed:assert...><svrl:text>Should have x element (near "Alphabetics")</svrl:text></svrl:failed-assert>
<svrl-failed:assert...><svrl:text>Should have<svrl:span class="name">x</svrl:name> element</svrl:text>
        (near "<svrl:span class="value-of">Alphabetics</svrl:span>")</svrl:failed-assert>

Current pretty printed example (e.g in bullet list}:

Proposed assertion:

<sch:assert test="x | y" >Should have <sch:name>x  or y<sch:name>  element
        (near "<sch:value-of select="title">current title</sch:value-of")")</sch:assert>

Proposed SVRL outcomes:
no change

Proposed pretty-printed example

What is it that you think downstream processor could be doing?

FYI I use the standard as the source of information about SVRL, a copy of the SVRL one is here: https://github.com/Schematron/schema/blob/main/svrl.rnc.

Me too. The document I referenced is a mapping between Schematron and SVRL: it might be useful for a draft of an annex for the standard, if there is demand.

Pretty printing Oops, I meant "from the schema" not "from the standard".

(title and p) are only available in a limited set of contexts

I don't see how your question speaks against my point. (Anyway, that e.g. sch:pattern/sch:p has no equivalent in the SVRL shows that at least sch:p is not intended to be information transfered to the SVRL but meta-text of the schema.)