TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
282 stars 84 forks source link

schemaSpec should provide a mechanism for specifing Schematron query language binding #2330

Open martindholmes opened 2 years ago

martindholmes commented 2 years ago

Schematron schemas can specify a range of different values for the query language binding (typically xslt, xslt2, or xslt3, but many more are listed here: https://archive.xmlprague.cz/2022/files/presentations/schematron-qlb.pdf). The ATOP team believes that TEI ODD should have a mechanism for specifying this, perhaps through an attribute on the root schemaSpec element called schQueryLanguageBinding.

martindholmes commented 2 years ago

The ATOP team is thinking of two distinct approaches to this:

  1. (Simple): There is a single @schQueryLanguageBinding attribute available on <schemaSpec>, meaning that all Schematron rules must be compiled into a single Schematron output file and use the same query language binding.
  2. (Adventurous): We create a <constraintDecl> element in the header, where various things can be defined and described, including the query language binding value, as well as namespace declarations etc. <constraintSpec> would then be added to att.declaring, and each <constraintSpec> could point up to a <constraintDecl>. For each distinct <constraintDecl> element, a distinct Schematron file would be created using the specified content, including the query language binding, and including all the constraints which point to it.

The second option would obviously provide much more flexibility, but would require non-trivial fixes to the existing stylesheets, and there may not be many real use-cases for it; generally, we would expect people to want to use the most advanced query language binding that their processor can support (why not?), and there's no particular reason to want to use an earlier language binding. There's also the question then of how you might override the binding for existing constraints in a downstream customization.

ebeshero commented 2 years ago

I like the adventurous option in theory, but I’m struggling to imagine a practical use-case for generating multiple different Schematron files. I also wonder whether we run a risk of over-complicating schema validation: could multiple Schematron files potentially conflict with each other’s validation when Schematron rules from two different files are triggered in an overlapping context? I guess I wonder if the adventurous path leads to too much potential trouble.

lb42 commented 2 years ago

If you think of the generated schematron file as an output from the odd rather than as input to the validator, surely it makes sense to maximize flexibility for its format?

jamescummings commented 2 years ago

The separation of concerns provided by the second option seems more in keeping with the TEI approach to such things.

sydb commented 1 year ago

Also note that (per #335) we should be documenting, somewhere, that one does not get a query binding if using Schematron embedded in RELAX NG.

sydb commented 1 year ago

Council thinks a simple expandable approach is the way to go — a <constraintDecl> in the TEI Header of the base odd that is not repeatable (and not a member of att.declarable). If & when there is user demand to be able to express constraints in a variety of language bindings, we can make it repeatable, add it to att.delcarable, and add <constraintSpec> to att.declaring.

sydb commented 1 year ago

A first crack at a specification of the new <constraintDecl>:

<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright TEI Consortium. 
Dual-licensed under CC-by and BSD2 licences 
See the file COPYING.txt for details
$Date$
$Id$
-->
<?xml-model href="https://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/xml/tei/odd/p5.nvdl" type="application/xml" schematypens="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"?>
<elementSpec xmlns="http://www.tei-c.org/ns/1.0" xmlns:sch="http://purl.oclc.org/dsdl/schematron" module="tagdocs" ident="constraintDecl">
  <gloss versionDate="2023-03-09" xml:lang="en">constraint declaration</gloss>
  <desc versionDate="2023-03-09" xml:lang="en">contains declarations pertaining to formal constraints expressed elsewhere in <gi>constraintSpec</gi> elements</desc>
  <classes>
    <memberOf key="att.global"/>
  </classes>
  <content>
    <sequence>
      <alternate minOccurs="0" maxOccurs="unbounded">
        <classRef key="model.identEquiv"/>
        <classRef key="model.descLike"/>
      </alternate>
      <anyElement/>             <!-- typically <sch:ns> elements -->
    </sequence>
  </content>
  <attList>
    <attDef ident="scheme" usage="req">
      <desc versionDate="2023-03-09" xml:lang="en">supplies the name of the language to which the declarations herein apply</desc>
      <datatype><dataRef key="teidata.enumerated"/></datatype>
      <valList type="semi">
        <valItem ident="schematron">
          <gloss versionDate="2016-09-27" xml:lang="en">ISO Schematron</gloss>
        </valItem>
      </valList>
      <remarks versionDate="2023-03-09" xml:lang="en">
        <p>The declarations contained in a particular
        <gi>constraintDecl</gi> apply to the <gi>constraintSpec</gi>
        elements whose <att>scheme</att> matches the <att>scheme</att>
        of the <gi>constraintDecl</gi>.</p>
      </remarks>
    </attDef>
    <attDef ident="queryBinding" usage="rec">
      <gloss xml:lang="en" versionDate="2023-03-09">query language binding</gloss>
      <desc xml:lang="en" versionDate="2023-03-09">specifies the query
      language binding for rule-based schema expressions in
      <gi>constraintSpec</gi> elements that have a matching
      <att>scheme</att> attribute</desc>
      <datatype><dataRef key="teidata.enumerated"/></datatype>
      <valList type="semi">
        <valItem ident="exslt"/>
        <valItem ident="stx"/>
        <valItem ident="xslt"/>
        <valItem ident="xslt2"/>
        <valItem ident="xslt3"/>
        <valItem ident="xpath"/>
        <valItem ident="xpath2"/>
        <valItem ident="xpath3"/>
        <valItem ident="xpath31"/>
        <valItem ident="xquery"/>
        <valItem ident="xquery3"/>
        <valItem ident="xquery31"/>
      </valList>
      <remarks versionDate="2023-03-09" xml:lang="en">
        <p>The suggested values above are the values reserved by the
        Schematron specification. Only <val>exslt</val>,
        <val>stx</val>, <val>xslt</val>, <val>xslt2</val>,
        <val>xslt3</val>, <val>xpath2</val>, and <val>xpath3</val> are
        defined by the specification. Most processors only support a
        subset of <val>xslt</val>, <val>xslt2</val>, and
        <val>xslt3</val>.</p>
      </remarks>
    </attDef>
  </attList>
  <exemplum xml:lang="en">
    <egXML xmlns="http://www.tei-c.org/ns/Examples">
      <constraintDecl scheme="schematron" queryBinding="xslt3">
        <sch:ns prefix="wwp" uri="http://www.wwp.northeastern.edu/ns/textbase"/>
      </constraintDecl>
    </egXML>
  </exemplum>
  <listRef>
    <ptr target="#?????"/>
  </listRef>
</elementSpec>
ebeshero commented 1 year ago

Discussion after Council meeting 2023-03-10 of @sydb @hcayless @ebeshero @martinascholger

Are we in a rush? — Only insofar as ATOP TF wants to know that there is going to a <constraintDecl> ; if the exact XPath to the query binding language changes later, no big deal.

Where do we put <constraintDecl>?

Two possibilities jump to mind:

  1. Make it an option to put it in either an <encodingDesc> in the <teiHeader> or in a <schemaSpec>.
  2. It goes only in <schemaSpec>. In which case you would generally need a <schemaSpec> to appear in a base ODD (rather than just a customization ODD).

Note that if we choose option # 2 — <constraintDecl> only goes in <schemaSpec> — then we would need to add a <schemaSpec> to the TEI Guidelines, because they do not currently have one. We think that new <schemaSpec> should show up in the driver files (i.e. P5/source/guidelines-en.xml and P5/source/guidelines-fr.xml), but no one has any idea what kind of havoc adding a <schemaSpec> might wreak on the build process.

lb42 commented 1 year ago

I used to know why there is no <schemaSpec> in the TEI Guidelines and it was for a plausible reason. Something to do with the fact that P5 itself isn't an ODD (though tei_all is). So I'd definitely vote for putting this thing inside tjhe encodingDesc.

sydb commented 1 year ago

Council 2023-04-14 agrees that there will be a mechanism for an ODD writer to specify the query language binding, and that it will be accessible by XPath, without actually committing to <constraintDecl> in any particular place. (This means ATOP group can move forward, and update the XPath to access the desired binding later.)

raffazizzi commented 5 months ago

This has been dormant for over a year and probably needs discussion from council at the next opportune moment in order to move forward with a query language binding mechanism.

raffazizzi commented 5 months ago
  1. Make it an option to put it in either an <encodingDesc> in the <teiHeader> or in a <schemaSpec>.

FWIW this seems like a better option to me, so that both the TEI source, customizations, and non-TEI grammars that use <schemaSpec> can use the future <constraintDecl>

raffazizzi commented 1 month ago

Created a draft PR https://github.com/TEIC/TEI/pull/2596. Some details need to be discussed.

raffazizzi commented 1 month ago

Well I feel silly, there was a already a PR about this (but in my defense, it wasn't linked to this ticket!) https://github.com/TEIC/TEI/pull/2594