TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
282 stars 88 forks source link

att.declaring and att.declarable need constraints and better explanation #1981

Open sydb opened 4 years ago

sydb commented 4 years ago

Some of these issues are trivial; some may require tickets of their own.

  1. Typo in 15.3, “Associating Contextual Information with a Text” (#CCAH) — disagreement in number: “The TEI scheme allow for the following …”.
  2. I think the following bullet point should have articles, i.e. “There may be multiple occurrences of certain elements in either the corpus or a text header”.
  3. Similarly for the 2nd sentence of 15.3.2: “… a particular part of a text header or the corpus header by means of …‭”.
  4. The 3rd paragraph in #CCAS2 is problematic.
    1. The list of declarable elements should be generated.
    2. “All of the above elements may be multiply defined within a single header” should probably be “Each of the above elements is repeatable” or similar.
    3. “every declarable element must bear a unique identifier”
      • Does this really mean every declarable element must have an @xml:id (which would be nuts, but is what it says), or does it mean every element of the same type (which would make a lot of sense and is what 15.3.3 number 3 sort of implies), or does it mean every element of the same type that has a sibling of that type?
        • No matter which it means, would be good to have a constraint enforcing it.
    4. “… must be specified as the default, by means of the default attribute”: The only 2 ways to indicate something is the default are for it to be its parent’s only child of the given element type or to have a @default specified as "true". Given that the former is precluded, I think the prose here should just be specific and say the latter: “… must be specified as the default by having a @default attribute with the value "true"”.
  5. “Here is the structure for a text which does state otherwise:”: Should be something like “Here is the structure of a text in which a division does state otherwise:”, no?
  6. “the contents of the divisions D1 and D3 … and … division D2”: The values of the @xml:ids of the divisions in question are "d1", "d3", and "d2".
  7. “The identifier or identifiers specified by the @decls attribute …”: They are not identifiers; what is specified on @decls are pointers (URIs, to be precise). Besides, it is not the values of @decls that are restricted, but rather the things being pointed at by an @decls.
  8. First bullet point for this, “An identifier specifying an element which contains multiple instances of one or more other elements should be interpreted as if it explicitly identified the elements identified as the default in each such set of repeated elements”: I am not sure what this means. I think it means that if a declaring element points to declarable element A, then for any set of children of A that are of the same declarable element type, the child that is the default applies. Whether I’m right or wrong, this should be rewritten. (And, I presume that if the @decls that points to A also points to a non-default child of A, that one applies instead.)
  9. (Switching to the XML) for a text specifying <att>decls</att> as <q><val>ED2</val></q>, correction C2A, and normalization N2B will apply. should be more like for a text specifying <att>decls</att> as <val>#ED2</val>, correction C2A and normalization N2B will apply..
  10. Next paragraph: all of the pointers are missing their # characters.
  11. CCAS3 Summary, item 3: Same issue about articles as above; I think it should read “Where there are multiple occurrences of declarable elements within a text’s header or its corpus header” or some such.
  12. Second sub-bullet of above (currently reads one only must bear a <att>default</att> attribute with the value <!-- JC: 2018-07-20: This should be changed to 'true' --><val>YES</val>.): Obviously, @jamescummings is correct, as "YES" is not one of the possible values of @default. (BTW, this is the only occurrence of either “YES” or “NO” as a word in the Guidelines.) But furthermore, I think the more standard wording (at least in American English) would be “one and only one must bear …”. Thus my suggestion is one and only one must bear a <att>default</att> attribute with the value <val>true</val> (or <val>1</val>)..

constraints for att.declarable

For validation, I think someday we would like a mechanism for building a list of elements from class membership dynamically at build time. Until then, we could make due with Schematron abstract rules.

In att.declarable.xml:

  <sch:pattern id="declarable" abstract="true">
    <!-- parameter 'tde' is for "this declarable element (type)" -->
    <sch:rule context="tei:*[child::$tde[2]]">
      <sch:let name="declarableGI" value="name( $tde[1] )"/>
      <sch:report test="child::$tde[ not( @xml:id ) ]">
        When there is more than one <sch:value-of select="$declarableGI"/>, each must have an @xml:id
      </sch:report>
      <sch:assert test="count( child::$tde[ normalize-space( @default ) = ('1','true') ] ) eq 1">
        When there is more than one <sch:value-of select="$declarableGI"/>, one and only one must have a @default of 'true'.
      </sch:assert>
    </sch:rule>
  </sch:pattern>

(Note that the declaration of the variable $declarableGI fails when I try this using probatron, but it works in oXygen. If you replace the variable reference with the value everywhere it works in both.)

And then in each element that has <memberOf key="att.declarable"/> (see below for the current list), something like the following:

  <sch:pattern id="declarable_xenoData" is-a="declarable">
    <sch:param name="tde" value="tei:xenoData"/>
  </sch:pattern>

constraints for att.declaring

Probably in att.declaring.xml:

  <sch:pattern id="decls" abstract="false">
    <!--
      Use element as context, as some processors inappropriately barf when a context
      is an attribute node:
    -->
    <sch:rule context="tei:*[@decls]">
      <!-- sequence of decls pointers: -->
      <sch:let name="dptrs" value="tokenize( normalize-space( @decls ), '&#x20;')"/>
      <!-- sequence of the @xml:ids pointed at by each decls pointer: -->
      <sch:let name="dxids" value="for $dptr in $dptrs return substring-after( $dptr, '#')"/>
      <!-- sequence of the target elements of the decls pointers: -->
      <sch:let name="dtars" value="for $dptr in $dptrs return
        if ( starts-with( $dptr, '#') )
        then id( substring-after( $dptr, '#') )
        else if ( contains( $dptr, '#') )
        then doc( $dptr )/id( substring-after( $dptr, '#') )
        else current()"/>
      <!-- sequence of the element types of the taret elements of the decls ptrs: -->
      <sch:let name="dtGIs" value="for $dtar in $dtars return name( $dtar )"/>
      <!-- sequence of the children of the targets of the decls pointers: -->
      <sch:let name="dtars_kids" value="$dtars/tei:*"/>
      <!-- sequence of the names of the children: -->
      <sch:let name="dtars_kids_GIs" value="for $kid in $dtars_kids return name( $kid )"/>
      <!-- sequence of all the GIs (target and target children): -->
      <sch:let name="decls_GIs" value="( $dtGIs, $dtars_kids_GIs )"/>
      <sch:assert test="count( $decls_GIs ) eq count( distinct-values( $decls_GIs ) )">
        Two or more of the elements referred to either explicitly or implicitly by the @decls if this <sch:name/> (<sch:value-of
          select="$dptrs"/>) are the same kind of metadata element. 
      </sch:assert>
    </sch:rule>
  </sch:pattern>

Note that if a local element (i.e., same file) pointed to by @decls is not found, it is simply ignored; if a remote element (i.e., different file) pointed to by @decls is not found this does not fail gracefully, rather a 404 is raised.


list of declarable elements

tei:availability | tei:bibl | tei:biblFull | tei:biblStruct | tei:broadcast | tei:correction | tei:correspDesc | tei:editorialDecl | tei:equipment | tei:geoDecl | tei:hyphenation | tei:interpretation | tei:langUsage | tei:listApp | tei:listBibl | tei:listEvent | tei:listNym | tei:listObject | tei:listOrg | tei:listPerson | tei:listPlace | tei:metDecl | tei:normalization | tei:particDesc | tei:projectDesc | tei:punctuation | tei:quotation | tei:recording | tei:refsDecl | tei:samplingDecl | tei:scriptStmt | tei:segmentation | tei:settingDesc | tei:sourceDesc | tei:stdVals | tei:styleDefDecl | tei:textClass | tei:textDesc | tei:xenoData

ebeshero commented 4 years ago

VF2F: Council greenlights a first stage of edits to clean up the prose, and then revisit to discuss what more may need to be done.

martinascholger commented 4 years ago

Council at VF2F suggests to clean up the typos and unclear explanations first. @raffazizzi, @ju -- please open separate issues if necessary.

raffazizzi commented 3 years ago

@sydb's point 4.iii

“every declarable element must bear a unique identifier”

  • Does this really mean every declarable element must have an @xml:id (which would be nuts, but is what it says), or does it mean every element of the same type (which would make a lot of sense and is what 15.3.3 number 3 sort of implies), or does it mean every element of the same type that has a sibling of that type?

I think it means what it says: every element must have an identifier. The bullet point that follows the one @sydb refers to indicates very specifically that a default should be indicated "for each different type of declarable element which occurs more than once within the same parent element". So, if the first bullet point meant elements of the same type it would have been as explicit as the second.

Having said that, I agree with @sydb that it's overkill to require ids everywhere and that it makes more sense to only enforce them for elements of the same type. But I think we need to discuss this as a group.

Otherwise, I'm working on the rest on a branch

raffazizzi commented 3 years ago

VF2F agrees with adjusting the language so that not every declarable element must have @xml:id. See new wording in branch.

raffazizzi commented 2 years ago

Updated and merged branch. https://github.com/TEIC/TEI/commit/124531b7c099d3d99a708a6217d2fe545e4de3f6

raffazizzi commented 2 years ago

Branch was very behind. Reverted merge and will try to fix the branch before attempting merge again

raffazizzi commented 2 years ago

Merged in only prose changes (f4c625d783c10e710bbc36664ff640c1a7446736). @sydb to revise Schematron constraints soon.

sydb commented 1 year ago

@raffazizzi and I just had a long chat about this ticket. It is almost ready to be closed, but we cannot implement it because Schematron abstract patterns are not processed correctly in P5/antbuilder.xml (steps 8 & 9a only call iso_svrl_for_xslt2.xsl, but should call iso_dsdl_include.xsl and iso_abstract_expand.xsl, too).

So we should either

  1. fix antbuilder to use the skeleton implementation properly; OR
  2. replace the skeleton implementation with mausatron; OR
  3. give up on using abstract patterns, and instead invent a mechanism for having the @context of a rule be all the members of a class; OR
  4. give up on having any mechanism for applying a constraint specification to an entire class, and just enumerate the members of the class in the @context.
sydb commented 1 year ago

Having thought about this a bit (not a lot) I have decided that I like (2) the best and (1) next; (4) is not really that much worse than (2) or (1), but feels a lot worse, so I don’t like it; and (3) is at least very very hard if not outright impossible (it doesn’t seem that hard when you are dealing with a single customization of a given language, but if you have a customization chain it would get out of hand, I think). So I am am planning to start poking at implementing (2) sometime.

sydb commented 1 year ago

I went to poke at antbuilder.xml a bit, and discovered (somewhat to my horror) that I have already implemented (1). However, it does not work, in that abstract patterns still cause problems. (The rest of the Schematron probably works fine, I did not test much. But the abstract patterns work so badly that the Guidelines won’t build.) So unless I did something wrong, (1) is not going to work, anyway.

sydb commented 1 year ago

I have now implemented (2), above, in branch issue1981bis. It passes all the current tests in a Docker environment. I have not actually checked in the abstract patterns for testing att.declarable, yet. (Remember, that was the reason we wanted to update the build process to use a more modern Schematron processor.) I encourage anyone and everyone to check out this branch and see if it builds on your system. The work so far is reflected in 31320225f.

sydb commented 1 year ago

I have implemented (2) — use mausatron (aka schxslt) as our Schematron processor so we can use abstract patterns; and also checked the constraints on att.declarable elements expressed using an abstract pattern. (See commit 6044db652.) However, this ticket is blocked by #2455, because the output of the test for @default is different depending on which version of rnv you use.

sydb commented 11 months ago

As #2455 has now been dealt with, I have merged dev into branch issue1981bis (which was a lot of work). So I think this may be ready to merge. To be honest, I do not even remember where we are in actually getting better constraints for att.declaring and att.declarable. At this point, the main item of interest is that the branch for this ticket includes updating our build process from the Schematron Skeleton implementation to @dmj’s SchXslt (which I call “mausatron”). That change is required for CMC to move forward.

raffazizzi commented 4 months ago

@sydb I removed Status: Blocked since #2455 was resolved. I merged dev into issue1981bis without issue and pushed. Would you consider opening a PR?

ebeshero commented 4 months ago

@raffazizzi @sydb See this comment on the existing PR (#2509) , though: https://github.com/TEIC/TEI/pull/2509#issuecomment-2127923822

In May Council decided we needed to consult someone about fixing NVDL...I don't think that has happened yet?

raffazizzi commented 1 month ago

This issue got orphaned a little, but it's not too late to update the branch. @ebeshero do you think we need more discussion or can I (or @sydb) attempt a PR? Marking as Pending for now.