(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as one of the most important IMHO.)
This is a much simpler proposal as an alternative to #16 (chaining phases with progressive visibility of prior SVRL)
Use Cases
I am an educator. I have given my students a 300 question exam with single- and multiple- choice questions. I use Schematron to mark each answer. But I have no way to aggregate the scores or find patterns in them, because the results of validation are not visible within Schematron.
I am a dashboard developer for some industrial process. I use Schematron to detect and report complex patterns in the process. I want to further detect what the patterns tell me about the state of the system, e.g. if there are more than 10 failed assertions of a serious type.
I am a legal publisher who ingests case reports made by 100 different courts in OOXML, ODF and RTF. Within each court, there are different data entry operators who use different conventions willy nilly. Some use styesheets. Some use tables to format a title page for each case. I use Schematron to find patterns to let me do "feature extraction" on the document. But I want to detect outliers as well as group features to allow dispatching to appropriate processes. And I prefer if it is all in one place (file).
I want to write a validation that is more like a Hidden Markov model (but not): there is one set of detectors that look at what is found in the document, then another set that operates on that sequence of detected things to figure out which transition to take.
Problem
Validation results are not visible in Schematron. Therefore you need to have a second pass, involving a shell script, XProc etc. This is not convenient, and severely limits Schematron. My experience of XProc is that, while it works, it is at least as complex as Schematron and so, even if you system is tooled-up for it, can easily be overkill.
Also top-level parameters and variables made from the original document are not visible in downstream processes.
Proposal
Allow sch:pattern/@documents="#SVRL" to invoke a map-reduce operating mode.
Other patterns run as normal and generate SVRL. The SVRL is then validated by these special patterns. The resulting SVRL is the validation result, or could merged with the first stage's SVRL at implementer option.
The same scoping rules apply as for other @documents: top-level params are visible as are any top-level variables (i.e., sch:schema/sch:let) which continue to be evaluated on the original document.
Discussion
There is obvious scope to turn Schematron phases into some state machine, where one pattern enables another: it is a nice geeky thought. Similarly to make phases or patterns more like XProc processes that can chain.
However, it seems to me that this is overkill and complexifying, when what would be more usable is to allow Schematron to act in a "map reduce" fashion: the original validation is the "map" and this proposed second pass is the "reduce".
Rather than learn and install some pipeline system, there is no schema changes to Schematron in this propasal: just one special value that conceptually fits with the current definitions.
(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as one of the most important IMHO.)
This is a much simpler proposal as an alternative to #16 (chaining phases with progressive visibility of prior SVRL)
Use Cases
I am an educator. I have given my students a 300 question exam with single- and multiple- choice questions. I use Schematron to mark each answer. But I have no way to aggregate the scores or find patterns in them, because the results of validation are not visible within Schematron.
I am a dashboard developer for some industrial process. I use Schematron to detect and report complex patterns in the process. I want to further detect what the patterns tell me about the state of the system, e.g. if there are more than 10 failed assertions of a serious type.
I am a legal publisher who ingests case reports made by 100 different courts in OOXML, ODF and RTF. Within each court, there are different data entry operators who use different conventions willy nilly. Some use styesheets. Some use tables to format a title page for each case. I use Schematron to find patterns to let me do "feature extraction" on the document. But I want to detect outliers as well as group features to allow dispatching to appropriate processes. And I prefer if it is all in one place (file).
I want to write a validation that is more like a Hidden Markov model (but not): there is one set of detectors that look at what is found in the document, then another set that operates on that sequence of detected things to figure out which transition to take.
Problem
Validation results are not visible in Schematron. Therefore you need to have a second pass, involving a shell script, XProc etc. This is not convenient, and severely limits Schematron. My experience of XProc is that, while it works, it is at least as complex as Schematron and so, even if you system is tooled-up for it, can easily be overkill. Also top-level parameters and variables made from the original document are not visible in downstream processes.
Proposal
Allow sch:pattern/@documents="#SVRL" to invoke a map-reduce operating mode. Other patterns run as normal and generate SVRL. The SVRL is then validated by these special patterns. The resulting SVRL is the validation result, or could merged with the first stage's SVRL at implementer option. The same scoping rules apply as for other @documents: top-level params are visible as are any top-level variables (i.e., sch:schema/sch:let) which continue to be evaluated on the original document.
Discussion
There is obvious scope to turn Schematron phases into some state machine, where one pattern enables another: it is a nice geeky thought. Similarly to make phases or patterns more like XProc processes that can chain. However, it seems to me that this is overkill and complexifying, when what would be more usable is to allow Schematron to act in a "map reduce" fashion: the original validation is the "map" and this proposed second pass is the "reduce".
Rather than learn and install some pipeline system, there is no schema changes to Schematron in this propasal: just one special value that conceptually fits with the current definitions.