Open martindholmes opened 2 years ago
The ATOP team is thinking of two distinct approaches to this:
@schQueryLanguageBinding
attribute available on <schemaSpec>
, meaning that all Schematron rules must be compiled into a single Schematron output file and use the same query language binding.<constraintDecl>
element in the header, where various things can be defined and described, including the query language binding value, as well as namespace declarations etc. <constraintSpec>
would then be added to att.declaring
, and each <constraintSpec>
could point up to a <constraintDecl>
. For each distinct <constraintDecl>
element, a distinct Schematron file would be created using the specified content, including the query language binding, and including all the constraints which point to it.The second option would obviously provide much more flexibility, but would require non-trivial fixes to the existing stylesheets, and there may not be many real use-cases for it; generally, we would expect people to want to use the most advanced query language binding that their processor can support (why not?), and there's no particular reason to want to use an earlier language binding. There's also the question then of how you might override the binding for existing constraints in a downstream customization.
I like the adventurous option in theory, but I’m struggling to imagine a practical use-case for generating multiple different Schematron files. I also wonder whether we run a risk of over-complicating schema validation: could multiple Schematron files potentially conflict with each other’s validation when Schematron rules from two different files are triggered in an overlapping context? I guess I wonder if the adventurous path leads to too much potential trouble.
If you think of the generated schematron file as an output from the odd rather than as input to the validator, surely it makes sense to maximize flexibility for its format?
The separation of concerns provided by the second option seems more in keeping with the TEI approach to such things.
Also note that (per #335) we should be documenting, somewhere, that one does not get a query binding if using Schematron embedded in RELAX NG.
Council thinks a simple expandable approach is the way to go — a <constraintDecl>
in the TEI Header of the base odd that is not repeatable (and not a member of att.declarable). If & when there is user demand to be able to express constraints in a variety of language bindings, we can make it repeatable, add it to att.delcarable, and add <constraintSpec>
to att.declaring.
A first crack at a specification of the new <constraintDecl>
:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright TEI Consortium.
Dual-licensed under CC-by and BSD2 licences
See the file COPYING.txt for details
$Date$
$Id$
-->
<?xml-model href="https://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<elementSpec xmlns="http://www.tei-c.org/ns/1.0" xmlns:sch="http://purl.oclc.org/dsdl/schematron" module="tagdocs" ident="constraintDecl">
<gloss versionDate="2023-03-09" xml:lang="en">constraint declaration</gloss>
<desc versionDate="2023-03-09" xml:lang="en">contains declarations pertaining to formal constraints expressed elsewhere in <gi>constraintSpec</gi> elements</desc>
<classes>
<memberOf key="att.global"/>
</classes>
<content>
<sequence>
<alternate minOccurs="0" maxOccurs="unbounded">
<classRef key="model.identEquiv"/>
<classRef key="model.descLike"/>
</alternate>
<anyElement/> <!-- typically <sch:ns> elements -->
</sequence>
</content>
<attList>
<attDef ident="scheme" usage="req">
<desc versionDate="2023-03-09" xml:lang="en">supplies the name of the language to which the declarations herein apply</desc>
<datatype><dataRef key="teidata.enumerated"/></datatype>
<valList type="semi">
<valItem ident="schematron">
<gloss versionDate="2016-09-27" xml:lang="en">ISO Schematron</gloss>
</valItem>
</valList>
<remarks versionDate="2023-03-09" xml:lang="en">
<p>The declarations contained in a particular
<gi>constraintDecl</gi> apply to the <gi>constraintSpec</gi>
elements whose <att>scheme</att> matches the <att>scheme</att>
of the <gi>constraintDecl</gi>.</p>
</remarks>
</attDef>
<attDef ident="queryBinding" usage="rec">
<gloss xml:lang="en" versionDate="2023-03-09">query language binding</gloss>
<desc xml:lang="en" versionDate="2023-03-09">specifies the query
language binding for rule-based schema expressions in
<gi>constraintSpec</gi> elements that have a matching
<att>scheme</att> attribute</desc>
<datatype><dataRef key="teidata.enumerated"/></datatype>
<valList type="semi">
<valItem ident="exslt"/>
<valItem ident="stx"/>
<valItem ident="xslt"/>
<valItem ident="xslt2"/>
<valItem ident="xslt3"/>
<valItem ident="xpath"/>
<valItem ident="xpath2"/>
<valItem ident="xpath3"/>
<valItem ident="xpath31"/>
<valItem ident="xquery"/>
<valItem ident="xquery3"/>
<valItem ident="xquery31"/>
</valList>
<remarks versionDate="2023-03-09" xml:lang="en">
<p>The suggested values above are the values reserved by the
Schematron specification. Only <val>exslt</val>,
<val>stx</val>, <val>xslt</val>, <val>xslt2</val>,
<val>xslt3</val>, <val>xpath2</val>, and <val>xpath3</val> are
defined by the specification. Most processors only support a
subset of <val>xslt</val>, <val>xslt2</val>, and
<val>xslt3</val>.</p>
</remarks>
</attDef>
</attList>
<exemplum xml:lang="en">
<egXML xmlns="http://www.tei-c.org/ns/Examples">
<constraintDecl scheme="schematron" queryBinding="xslt3">
<sch:ns prefix="wwp" uri="http://www.wwp.northeastern.edu/ns/textbase"/>
</constraintDecl>
</egXML>
</exemplum>
<listRef>
<ptr target="#?????"/>
</listRef>
</elementSpec>
Discussion after Council meeting 2023-03-10 of @sydb @hcayless @ebeshero @martinascholger
Are we in a rush? — Only insofar as ATOP TF wants to know that there is going to a <constraintDecl>
; if the exact XPath to the query binding language changes later, no big deal.
Where do we put <constraintDecl>
?
Two possibilities jump to mind:
<encodingDesc>
in the <teiHeader>
or in a <schemaSpec>
. <schemaSpec>
. In which case you would generally need a <schemaSpec>
to appear in a base ODD (rather than just a customization ODD).Note that if we choose option # 2 — <constraintDecl>
only goes in <schemaSpec>
— then we would need to add a <schemaSpec>
to the TEI Guidelines, because they do not currently have one. We think that new <schemaSpec>
should show up in the driver files (i.e. P5/source/guidelines-en.xml and P5/source/guidelines-fr.xml), but no one has any idea what kind of havoc adding a <schemaSpec>
might wreak on the build process.
I used to know why there is no <schemaSpec>
in the TEI Guidelines and it was for a plausible reason. Something to do with the fact that P5 itself isn't an ODD (though tei_all is). So I'd definitely vote for putting this thing inside tjhe encodingDesc.
Council 2023-04-14 agrees that there will be a mechanism for an ODD writer to specify the query language binding, and that it will be accessible by XPath, without actually committing to <constraintDecl>
in any particular place.
(This means ATOP group can move forward, and update the XPath to access the desired binding later.)
This has been dormant for over a year and probably needs discussion from council at the next opportune moment in order to move forward with a query language binding mechanism.
- Make it an option to put it in either an
<encodingDesc>
in the<teiHeader>
or in a<schemaSpec>
.
FWIW this seems like a better option to me, so that both the TEI source, customizations, and non-TEI grammars that use <schemaSpec>
can use the future <constraintDecl>
Created a draft PR https://github.com/TEIC/TEI/pull/2596. Some details need to be discussed.
Well I feel silly, there was a already a PR about this (but in my defense, it wasn't linked to this ticket!) https://github.com/TEIC/TEI/pull/2594
Schematron schemas can specify a range of different values for the query language binding (typically xslt, xslt2, or xslt3, but many more are listed here: https://archive.xmlprague.cz/2022/files/presentations/schematron-qlb.pdf). The ATOP team believes that TEI ODD should have a mechanism for specifying this, perhaps through an attribute on the root schemaSpec element called schQueryLanguageBinding.