TEIC / Stylesheets

TEI XSL Stylesheets
228 stars 124 forks source link

make att.repeatable work for `<sequence>`, `<alternate>`, and `<anyElement>` #633

Closed sydb closed 4 months ago

sydb commented 9 months ago

Re-worked how att.repeatable attributes (@minOccurs and @maxOccurs) work for <sequence>, <alternate>, and <anyElement> in order to fix #627. I have not attacked <datatype> yet, but might do so this week.

sydb commented 9 months ago

Read through the code and it makes sense. . . . Or am I misunderstanding how this should work?

Oh dear. Looks like the <sequence> is generating that which <alternate> should. (Is it possible I have them reversed? That would be embarrassing.) Looking now …

I have confirmed you are correct, @raffazizzi, I get the same (wrong) results. May take quite a bit longer to figure out why.

sydb commented 9 months ago

Both to keep @raffazizzi and any one else paying attention abreast of what is going on, and to leave some breadcrumbs for myself …

I have tweaked the algorithm for generating RELAX NG output in my local copy of odd2relax.xsl, and the result works on @raffazizzi’s test case, and passes Test2/ (finally — took a bit of doing) but fails on the oddbyexample and test21 cases in Test/. The reason it fails is that for the input

  <sequence minOccurs="0" maxOccurs="1">
    <elementRef key="front"/>
    <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/>
  </sequence>

my code produces

      <zeroOrMore>
        <ref name="model.global"/>
      </zeroOrMore>
      <group>
        <optional>
          <ref name="front"/>
          <zeroOrMore>
            <ref name="model.global"/>
          </zeroOrMore>
        </optional>
      </group>

whereas the current (released branch) code produces

      <zeroOrMore>
        <ref name="model.global"/>
      </zeroOrMore>
      <optional>
        <group>
          <ref name="front"/>
          <zeroOrMore>
            <ref name="model.global"/>
          </zeroOrMore>
        </group>
      </optional>

These two are exactly the same except that the <optional> and <group> are reversed. But they validate the same set of documents. Heck, trang converts them both to

model.global*, (front, model.global*)?

in the compact syntax. But when the thing that <ref name="front"> is referring to does not exist, the code in odd2relax.xsl that removes unneeded and problematic clauses (I think "pass3") fails to remove one of the model.global* when it is coded with the <group> on the outside. Sigh.

I am currently trying decide whether I should try to

  1. get my code to produce the <group> inside the <optional>, or
  2. get my code to drop the <group> entirely (it serves no purpose), or
  3. to get pass3 to be smarter; or (this last idea just struck me as I wrote this list)
  4. add another pass that converts the code I produce to that which pass3 expects.
ebeshero commented 9 months ago

@sydb aren't the <group> elements constraining your options? (Are they supposed to, but that isn't working in the RNG conversion?)

sydb commented 8 months ago

I implemented (4), above. It seems to work:

However:

  1. There is some ickyness in the RELAX NG output. There are differences in where the RELAX NG namespace is defaulted and when the prefix is specified, and there are now extraneous <rng:group> elements in the output. They make no change to what files are valid vs invalid, but just add pointless verbosity. I am not sure why they are there at the moment.
  2. It probably deserves more testing.
sydb commented 4 months ago

So after 4 months there had been no progress whatsoever. Figuring that lots has happened in dev branch during that time, I merged dev into this branch. That took quite a bit of doing — several hand merges, and then quite a bit of fixing stuff up. The result is that I am pretty sure this branch is back to doing what it is supposed to. E.g., this branch is back to passing all 4 tests listed in my previous comment. Moreover, if done soon, it should be easy to merge into dev.

sydb commented 4 months ago

I think we should test a few more ODDs that actually exercise the bug the fix addresses. So here is a quick set of instructions on how to do that.

But of course feel free to use RNG instead. Presuming nothing bombs, we expect the 627 output to be at least as correct, if not better than, the dev output. We also expect a few unimportant differences like

(((tst_model.common), tst_model.global*)+,

instead of

((tst_model.common, tst_model.global*)+,

for reasons I have not figured out. But that extra set of parens has no effect on validation, so don’t fret it. (The important bit is that in both cases model.common is plain, model.global has a ‘*’, and the whole thing has a ‘+’.)

Note [1] I will post one of my little test files and its results in a subsequent post here on the PR soon.

sydb commented 4 months ago

The following is a slightly modified copy of a shell transcript testing this on my machine. I have inserted a line of 34 hash marks between commands to make them easier to differentiate. (Hash mark is a comment in bash, so those lines have no effect.)

$ cd /path/to/TEIC_Stylesheets_repo/
$ ##################################
$ git pull
remote: Enumerating objects: 180, done.        
remote: Counting objects: 100% (180/180), done.        
remote: Compressing objects: 100% (37/37), done.        
remote: Total 180 (delta 145), reused 169 (delta 143), pack-reused 0        
Receiving objects: 100% (180/180), 440.00 KiB | 4.31 MiB/s, done.
Resolving deltas: 100% (145/145), completed with 60 local objects.
From github.com:TEIC/Stylesheets
   cfa97aa7..cc55edf4  dev               -> origin/dev
   91a8ff4d..885f7b54  sydb_627          -> origin/sydb_627
 * [new branch]        tri               -> origin/tri
 * [new branch]        trishaoconnor_548 -> origin/trishaoconnor_548
 * [new branch]        trishaoconnor_584 -> origin/trishaoconnor_584
Updating ef148441..cc55edf4
Fast-forward
 Documentation/teixsl.xml    | 4 ++--
 README.md                   | 4 ++--
 bin/teitorelaxng            | 2 +-
 common/common_core.xsl      | 2 +-
 common/functions.xsl        | 2 +-
 docx/from/paragraphs.xsl    | 2 +-
 docx/from/pass2.xsl         | 2 +-
 docx/to/mml2omml.xsl        | 8 ++++----
 docx/tools/build.xml        | 4 ++--
 fo/fo_textstructure.xsl     | 6 +++---
 html/html_textstructure.xsl | 6 +++---
 11 files changed, 21 insertions(+), 21 deletions(-)
$ ##################################
$ cat /tmp/minmaxtest.odd 
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>minOccurs maxOccurs test</title>
      </titleStmt>
      <publicationStmt>
        <ab>not really published, just a file for debugging stylesheets</ab>
      </publicationStmt>
      <sourceDesc>
        <ab>born digital</ab>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <p>Just a little test schema to exercise <att>minOccurs</att>
      and <att>maxOccurs</att> a wee bit for issue 627, PR 633.</p>
    </body>
    <back>
      <schemaSpec xmlns="http://www.tei-c.org/ns/1.0" ident="tst" prefix="tst_">
        <gloss>test</gloss>
        <moduleRef key="tei"/>
        <moduleRef key="textstructure"/>
        <moduleRef key="core"/>
        <moduleRef key="header"/>
        <moduleRef key="namesdates"/>
        <elementSpec ident="couplet" mode="add" ns="http://www.example.edu/ns">
          <classes>
            <memberOf key="model.lLike"/>
            <memberOf key="att.global"/>
          </classes>
          <content>
            <sequence minOccurs="2" maxOccurs="2">
              <elementRef key="l"/>
            </sequence>
          </content>
        </elementSpec>
        <elementSpec ident="triplet" mode="add" ns="http://www.example.edu/ns">
          <classes>
            <memberOf key="model.lLike"/>
            <memberOf key="att.global"/>
          </classes>
          <content>
            <sequence minOccurs="3" maxOccurs="3">
              <elementRef key="l"/>
            </sequence>
          </content>
        </elementSpec>
      </schemaSpec>
    </back>
  </text>
</TEI>
$ ##################################
$ git checkout dev
Already on 'dev'
Your branch is up to date with 'origin/dev'.
$ ##################################
$ bin/teitornc /tmp/minmaxtest.odd /tmp/minmaxtest_dev.rnc
WARNING: No localsource set. Will get a copy from /path/to/TEIC_Stylesheets_repo/source/p5subset.xml if necessary.
Convert /tmp/minmaxtest.odd to /tmp/minmaxtest_dev.rnc (tei to rnc) using profile default
     [echo] Do ODD expand processing (schema ) 
Warning: XML resolver not found; external catalogs will be ignored
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.apache.tools.ant.types.Permissions (file:/usr/share/ant/lib/ant.jar)
WARNING: Please consider reporting this to the maintainers of org.apache.tools.ant.types.Permissions
WARNING: System::setSecurityManager will be removed in a future release

BUILD SUCCESSFUL
Total time: 1 second
$ ##################################
$ git checkout sydb_627
branch 'sydb_627' set up to track 'origin/sydb_627'.
Switched to a new branch 'sydb_627'
$ ##################################
$ bin/teitornc /tmp/minmaxtest.odd /tmp/minmaxtest_627.rnc
WARNING: No localsource set. Will get a copy from /path/to/TEIC_Stylesheets_repo/source/p5subset.xml if necessary.
Convert /tmp/minmaxtest.odd to /tmp/minmaxtest_627.rnc (tei to rnc) using profile default
     [echo] Do ODD expand processing (schema ) 
Warning: XML resolver not found; external catalogs will be ignored
WARNING: A terminally deprecated method in java.lang.System has been called
WARNING: System::setSecurityManager has been called by org.apache.tools.ant.types.Permissions (file:/usr/share/ant/lib/ant.jar)
WARNING: Please consider reporting this to the maintainers of org.apache.tools.ant.types.Permissions
WARNING: System::setSecurityManager will be removed in a future release

BUILD SUCCESSFUL
Total time: 1 second
$ ##################################
$ diff -C0 /tmp/minmaxtest_dev.rnc /tmp/minmaxtest_627.rnc 
*** /tmp/minmaxtest_dev.rnc 2024-03-11 10:45:32.751098138 -0400
--- /tmp/minmaxtest_627.rnc 2024-03-11 10:45:53.916055804 -0400
***************
*** 11 ****
! # Schema generated from ODD source 2024-03-11T14:45:32Z. . 
--- 11 ----
! # Schema generated from ODD source 2024-03-11T14:45:53Z. . 
***************
*** 2637 ****
!     (tst_model.choicePart | tst_choice)+,
--- 2637,2639 ----
!     (tst_model.choicePart | tst_choice),
!     (tst_model.choicePart | tst_choice),
!     (tst_model.choicePart | tst_choice)*,
***************
*** 4112 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4114 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4173 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4175 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4189 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4191 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4205 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4207 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4221 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4223 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4237 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4239 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4253 ****
!        | (((tst_model.common), tst_model.global*)+,
--- 4255 ----
!        | ((tst_model.common, tst_model.global*)+,
***************
*** 4268 ****
!      (((tst_model.common), tst_model.global*)+,
--- 4270 ----
!      ((tst_model.common, tst_model.global*)+,
***************
*** 8378 ****
!   element couplet { tst_l, tst_att.global.attributes, empty }
--- 8380 ----
!   element couplet { (tst_l, tst_l), tst_att.global.attributes, empty }
***************
*** 8382 ****
!   element triplet { tst_l, tst_att.global.attributes, empty }
--- 8384,8386 ----
!   element triplet {
!     (tst_l, tst_l, tst_l), tst_att.global.attributes, empty
!   }
$ 

Besides the timestamps and all those extra parens around model.common, the only differences I see are:

So I declare this test a success. But it only tested one combination of minOccurs= and maxOccurs= on <sequence>. (Well, also another one on <alternate> if you count the change to <choice>, but pretty much every test has that, as it is broken as is in the Guidelines.)