Schematron / schematron-enhancement-proposals

This repository collects proposals to enhance Schematron beyond the ISO specification
9 stars 0 forks source link

Add attribute named e.g. 'severity' to schematron with well defined semantics for reporting severity levels #58

Open rkottmann opened 1 year ago

rkottmann commented 1 year ago

One major technical use case of schematron is to validate and report errors and severity levels to the users. Very often the role and flag attributes are used to communicate the severity level of an assertion or report etc.

However, the role and flag attributes are defined with very open semantics and are not restricted to severity levels. Which leaves a lot of uncertainty of how validators (or SVRL processors) interprete intended reporting of severity levels.

A possible backwards compatible solution would be to introduce a new attribute e.g. named severity which strictly defines possible severity levels to be reported. E.g. https://schematron.com/standards/standard_severity_levels_with_schematron_@role.html or as defined by XVRL. The would leave the role and flag attributes untouched and improve the usablity of schematron in many business use cases like e.g. standard conformance testing.

rjelliffe commented 1 year ago

This makes sense to me.

Perhaps for something that can be used now, people could just use an attribute in some agreed foreign namespace, e.g. <sch:rule context="fred" xvrl:severity="info" ...

If this gets take up, it provides good evidence for the ISO people, if they need it, to add it to Schematron.

(It might be best if the list of tokens was large enough to cope with the common use cases: for example, some such lists allow "hint" or "tip" as well.)

Regards Rick

On Thu, Jun 1, 2023 at 8:03 PM renzo @.***> wrote:

One major technical use case of schematron is to validate and report errors and severity levels to the users. Very often the role and flag attributes are used to communicate the severity level of an assertion or report etc.

However, the role and flag attributes are defined with very open semantics and are not restricted to severity levels. Which leaves a lot of uncertainty of how validators (or SVRL processors) interprete intended reporting of severity levels.

A possible backwards compatible solution would be to introduce a new attribute e.g. named severity which strictly defines possible severity levels to be reported. E.g. @.*** or as defined by XVRL https://github.com/xproc/xvrl/blob/master/src/main/schema/xvrl.rnc#L91. The would leave the role and flag attributes untouched and improve the usablity of schematron in many business use cases like e.g. standard conformance testing.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKPKXWI2TGF3TH6K5GTXJBSIHANCNFSM6AAAAAAYWVUTY4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AndrewSales commented 1 year ago

I agree. I think severity could well go straight in as @rkottmann suggests, but we'll see what flies.

xatapult commented 1 year ago

I agree as well.

rjelliffe commented 1 year ago

One suggestion: standardize those standard severity names, but leave the list open. A successful report is not necessarily an error: it could be that some other case has been found, that needs attention drawn. For example that a car has been stolen.

<sch:assert severity="error" ... /> <sch:assert severity="debug" ... /> <sch:report severity="error" .../> <sch:report severity="alert" .../> <sch:report severity="escalate" .../> <sch:report severity="potential" .../>

A better approach could be to allow multiple tokens, where the first must be a standard one, and subsequent ones can be anything, including localized versions.

<sch:assert severity="warning debug " ... /> <sch:report severity="info alert " ... />

Rick

On Thu, 1 Jun. 2023, 20:03 renzo, @.***> wrote:

One major technical use case of schematron is to validate and report errors and severity levels to the users. Very often the role and flag attributes are used to communicate the severity level of an assertion or report etc.

However, the role and flag attributes are defined with very open semantics and are not restricted to severity levels. Which leaves a lot of uncertainty of how validators (or SVRL processors) interprete intended reporting of severity levels.

A possible backwards compatible solution would be to introduce a new attribute e.g. named severity which strictly defines possible severity levels to be reported. E.g. @.*** or as defined by XVRL https://github.com/xproc/xvrl/blob/master/src/main/schema/xvrl.rnc#L91. The would leave the role and flag attributes untouched and improve the usablity of schematron in many business use cases like e.g. standard conformance testing.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKPKXWI2TGF3TH6K5GTXJBSIHANCNFSM6AAAAAAYWVUTY4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rkottmann commented 1 year ago

hi,

I am thinking in the same direction as @rjelliffe . My thinking is that @severity is not only a backwards compatible solution to the above problem statement but also a nice complement to @role and/or @flag.

E.g. <sch:assert role="car-check-and-give-hints" severity="warning" ... />The car could be stolen. Please check.</assert> or <sch:assert role="car-check-not-acceptable-findings-in-document, statistics-check" severity="fatal" ... />The numbers do not add up. Cars must have been stolen. Please inform the police.</assert>

IMHO one might consider to only use one severity level per assert/report for the sake of clarity and simplicity. But @role - as open as it is - can be used for groupings and "can be anything, including localized versions". Also in no single programming language and logging system, I have ever seen more than one severity level per statement. So allowing multiple token would significantly deviate from common sense.

That said, I fully agree on the requirement that a defined list should be extensible for local Schematron use, so that users can add own defined severity levels in addition to the well defined ones. One possible solution is to have an additional token like other: and allow other:self-defined-severity-level with e.g. a regex error|tip|hint|...|other:.*

Then, if I were to be an author of a Schematron application and in need of own defined severity levels, I would write an own Schematron rule to perform an additonal check on my Schematron which I need to develop. Such a rule could be paraphrased as if @severity starts with other: then only allow other:good-suggestion or other:dismiss-this-idea.

rjelliffe commented 1 year ago

Does the programming infrastructure support arbitrary or subclassed severities? Yes:

So I think providing subclassing or multiple-inheritence or other ways to categorize messages are a common feature of modern logging systems, for a good reason. Whenever your system gets complicated, it becomes useful to annotate the basic information with extra information: e.g. XML's attributes or programming languages' annotations. It is good if Schematron can provide as rich information as the consuming/hosting/dispatching applications/environments can consume.

Allowing multiple tokens to categorize something, where the tokens have an order but each token is not necessarily a subclass is not unprecedented: the HTML @class attribute does this for CSS. All that is means is that to determine the severity you can just parse the SVRL with ''' if @., '\s')[1]) = 'error') ... ''' or (if my memory serves me that a list of tokens evaluates true if any one token matches the string) ''' if @., '\s') = 'nag') ... ''' which is not hard.

For people's information, here are the severity levels with some common logging systems:


RFC 3164 (BSD syslog) and RFC 5424 (syslog message format) have

It also has a "facility" code to allow the message to be routed to particular logfiles.

https://en.wikipedia.org/wiki/Syslog See https://datatracker.ietf.org/doc/html/rfc5424


Java has

See https://docs.oracle.com/en/java/javase/17/docs/api/java.logging/java/util/logging/Level.html


Log4J, Log4J2, SLF4J

Apache Log4J has

Log4J2 and SLF4J leave out the FATAL.

For custom log levels, see https://logging.apache.org/log4j/2.x/manual/customloglevels.html For SLF4J Markers see see https://stackoverflow.com/questions/16813032/what-are-markers-in-java-logging-frameworks-and-what-is-a-reason-to-use-them


Windows has

see https://www.loggly.com/ultimate-guide/windows-logging-basics/

.NET has

Trace = 0, Debug = 1, Information = 2, Warning = 3, Error = 4, Critical = 5, and None = 6.

See https://learn.microsoft.com/en-us/aspnet/core/fundamentals/logging/?view=aspnetcore-7.0


Regards Rick

On Thu, Jun 8, 2023 at 6:21 AM renzo @.***> wrote:

hi,

I am thinking in the same direction as @rjelliffe https://github.com/rjelliffe . My thinking is that @Severity https://github.com/Severity is not only a backwards compatible solution to the above problem statement but also a nice complement to @ROLE https://github.com/ROLE and/or @Flag https://github.com/Flag.

E.g. <sch:assert role="car-check-and-give-hints" severity="warning" ... />The car could be stolen. Please check. or <sch:assert role="car-check-not-acceptable-findings-in-document, statistics-check" severity="fatal" ... />The numbers do not add up. Cars must have been stolen. Please inform the police.

IMHO one might consider to only use one severity level per assert/report for the sake of clarity and simplicity. But @ROLE https://github.com/ROLE - as open as it is - can be used for groupings and "can be anything, including localized versions". Also in no single programming language and logging system, I have ever seen more than one severity level per statement. So allowing multiple token would significantly deviate from common sense.

That said, I fully agree on the requirement that a defined list should be extensible for local Schematron use, so that users can add own defined severity levels in addition to the well defined ones. One possible solution is to have an additional token like other: and allow other:self-defined-severity-level with e.g. a regex error|tip|hint|...|other:.*

Then, if I were to be an author of a Schematron application and in need of own defined severity levels, I would write an own Schematron rule to perform an additonal check on my Schematron which I need to develop. Such a a rule could be paraphrased as if @severity starts with other: then only allow other:good-suggestion or other:dismiss-this-idea.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/58#issuecomment-1581455497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKILZGXJ3WP3TXWE7QTXKDPFXANCNFSM6AAAAAAYWVUTY4 . You are receiving this because you were mentioned.Message ID: @.*** com>

rjelliffe commented 1 year ago

Why TRACE and DEBUG?

I think it would be useful --if we have defined severity levels-- to allow "trace" and "debug". A common request that newbies have (and also people putting in sophisticated systems) is they want to confirm which context nodes were matched by a rule. But it is the kind of thing we may want to turn off. So the command line would allow suppression of assertions of severity level 'trace': ''' < sch:rule context="XXX" > < sch:report test="true()" severity="trace">This node was visited by the rule< /sch:report> ... '''

On Thu, Jun 1, 2023 at 8:03 PM renzo @.***> wrote:

One major technical use case of schematron is to validate and report errors and severity levels to the users. Very often the role and flag attributes are used to communicate the severity level of an assertion or report etc.

However, the role and flag attributes are defined with very open semantics and are not restricted to severity levels. Which leaves a lot of uncertainty of how validators (or SVRL processors) interprete intended reporting of severity levels.

A possible backwards compatible solution would be to introduce a new attribute e.g. named severity which strictly defines possible severity levels to be reported. E.g. @.*** or as defined by XVRL https://github.com/xproc/xvrl/blob/master/src/main/schema/xvrl.rnc#L91. The would leave the role and flag attributes untouched and improve the usablity of schematron in many business use cases like e.g. standard conformance testing.

— Reply to this email directly, view it on GitHub https://github.com/Schematron/schematron-enhancement-proposals/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF65KKPKXWI2TGF3TH6K5GTXJBSIHANCNFSM6AAAAAAYWVUTY4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AndrewSales commented 9 months ago

As a starting point for this new feature, I think it makes sense to start in a relatively contained way.

I like @rjelliffe 's suggestion to specify a small-ish set of values, but also to leave the set open-ended. This means we have commonly or widely used values at the core, while also allowing user-defined ones: the best of both worlds.

I will say I don't think there is much mileage in specifying the meaning of the core set of values closely, for three reasons:

  1. defining some values within an open-ended set conveys, roughly speaking, "here are some that you might find useful" and provides a potential baseline of interpretation across schemas and the output/SVRL produced when validating against them
  2. by allowing user-defined values, the semantics of the core set becomes less easy or worthwhile to define: a schema author may choose to use none from the core specified set (or no severities at all, for that matter)
  3. a conformant implementation determines whether a document is valid to the schema and validity means passing all the assertions in force. It seems to me that we would have to change that definition if we start to say e.g. only severities of fatal and error connote invalidity, to say nothing of how to handle user-defined values. Schematron's gift to validation is the interpretative step beyond a simple pass/fail, and a (human/machine) operation on SVRL output is one means to gauge "how valid" a document is.

Overall, it will still be a step forward to have a specific place in the language for this information, distinct from @role and @flag, as @rkottmann mentions.