Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
9 stars 0 forks source link

03.24 subject [ref. JP16 017] #53

Closed AndrewSales closed 8 months ago

AndrewSales commented 1 year ago

We do not understand why we need both "rule context" and "subject". Are there any subjects that are not rule contexts? (See comments on 5.5.14.) If so, we propose to replace "subject" by "node" (a term from XPath).

rjelliffe commented 1 year ago

The subject is conceptually and operationally distinct from the context.

Yes, in trivial cases, which are most cases, they will be the same. When you don't need the subject, don't use it: it is a zero-cost abstraction for users.

Take this example: ` < sch:rule context="table/row"> < sch:assert test="following-sibling::row or not(following-sibling::*)" subject="parent::table"

A table should only contain rows after the first row. < /sch:assert> < /sch:rule ` In this case, the assertion is stated in terms of tables. But the implementation in Xpath was done, for convenience, using rows as the context. So the @.*** attribute is provided to allow the implementation to return (in the SVRL location) an object that matches the text. This is because, in all cases, the text comes first in Schematron; or, at least, we want the deviser of the schema to be able to decide the text that is most meaningful to the users, to decide the object being located in the SVRL that is most meaningful for the user's systems, and to decide the XPath to implement those two things in a way that is most convenient for the developer.

Consider the well-known flaw of DTDs (and XSD and RELAX NG) that a broken content model is reported with information about where the brokenness was detected, not necessarily at the point that the problem actually occurred.

Now without the @subject we force the developer to write this: < sch:rule context="table"> < sch:assert test="count(*[self::row or preceding-sibling::row]) = count(row)"> A table should only contain rows after the first row. ... Now that is tolerable, if the developer was lucky enough to have done it in the first place. But it adds an extra burden on them.

Lets contrast this with DTDs: in DTDs (and XSD and RELAX NG) the point where an error is detected may bear no resemblance to the point where the error occurred. Useless and frustrating for users. This uselessness is one of the long-running problems with grammar-based validation.

For example, take a content model for picture that says (thumbnail, para+) | (para+, figure). The DTD validator will, if it finds a sequence [ thumbnail, para, figure], complain that the figure is unexpected. But what if our assertion is this: < sch:rule context="picture"> < sch:report test="figure and thumbnail"> A thumbnail is not needed when a picture has a figure. < /sch:report> < /sch:rule> In this case, the XPaths are very simple. But the rug does not match the curtain. So just as the DTD would fail providing the bad location of the figure, so that assertion would fail providing the bad location of the picture. By adding to the sch:report object="thumbnail" the SVRL can clearly point to the object that the developer wants the SVRL to locate, without having to recode all the other XPaths in potentially complex ways.

The larger, deeper and more complex a document is, and the longer the rules are and the more complex the assertions are, the more chance there is that the @.*** is not information that is directly useful as a location in the SVRL. For example, take this:

< sch:rule context="endnote"> < sch:p>Here are all the constraints on endnotes< /sch:p> ... < sch:report test=".//figure[not(caption)][1]" subject="(.//figure[not(caption)])[1]" > In an endnote, all figures in a chapter should have a caption. < /sch:report> < /sch:rule> So in this case, the schema developer has chosen that they only want to report the first of this error (which, for example, writers of gateway/firewall validators do to avoid unnecessary tests), and they want to group all the assertions relating to end-notes together into one rule (for whatever reason: their choice.) But they want the SVRL to locate the offending figure directly, not just be told that there is something somewhere awry in the end-note.

I have found @subject only useful in quite complex schemas (from memory where I was validating an input document against an output document following a complex transformation that did a lot of re-structuring), but where it was useful it was very useful indeed.

Regards Rick

On Sat, Apr 22, 2023 at 11:17 PM Andrew Sales @.***> wrote:

We do not understand why we need both "rule context" and "subject". Are there any subjects that are not rule contexts? (See comments on 5.5.14.) If so, we propose to replace "subject" by "node" (a term from XPath).

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKKFZS47BA3CL4DSG3DXCPK65ANCNFSM6AAAAAAXH2ZCJY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AndrewSales commented 8 months ago

Added a note to the definition of subject referring to the subject attribute and its usage.