TEIC / Stylesheets

TEI XSL Stylesheets
228 stars 124 forks source link

multiple `<anyElement>` results in invalid RELAX NG schema #631

Closed sydb closed 8 months ago

sydb commented 9 months ago

Another bug that has probably been around for years if not decades and never been noticed. If you have more than 1 <anyElement> in a content model the resulting RELAX NG is invalid.

I will post an ODD file (bundled with the output of teitorelaxng (soon to be teitorng) run on that ODD, as Github makes me ZIP it, anyway) that demonstrates the problem on this ticket shortly.

I think the problem is entirely contained in odd2relax.xsl. It generates an <rng:define> for each <anyElement> that occurs in the compiled ODD input. Then, in the cleanup phase (in "pass3"), I think it tries to delete some of the <rng:define>s that have just been created — presumably all but one. But in this case it deletes them all.

This is some very convoluted, uncommented code. I am still trying to wrap my head around the "max( ancestor::rng:div )" part. An <rng:div> has no effect on validation, so why is it important to delete only the <rng:define> that is most deeply nested in <rng:div>s, and how does that ensure there is only 1 resulting <rng:define>?

In any case, this problem is sort of blocking progress on #627, not because it is interfering with writing the code for fixing #627, but because it makes it a pain in the neck to test said code.

Several possible solutions jump to mind, but until I understand at least what the current Stylesheets are doing, better still why, I doubt I could really endorse any of them, let alone one over the others:

  1. alter "pass3" to delete all <rng:define>s for the given pattern, then put one back
  2. change code so that only 1 <rng:define> per <anyElement>-inside-given-<elementSpec>-or-<macroSpec> is generated in the first place. (This should not be hard to do — change //tei:anyElement to //*[tei:anyElement] in both places, and change template "anyElement" to match. But who knows what else that might break? Might be a problem because different occurrences of <anyElement> in the same content model might have different @require or @except, no?)
  3. change code so that a separate <rng:define> with a different pattern name is generated for each and every occurrence of <anyElement>. (This should not be hard to do — change definition of $id in "anyElement" template from concat('anyElement-',$spec/@ident) to concat('anyElement-', generate-id() ). But changing the references to each such pattern might be hard; probably not, but I am nto sure. And in any case, who knows what else that might break?)

OK, having written that out long hand I take back my previous statement about not knowing which is best. Seems to me at the moment that (3) is probably the only viable solution. Otherwise how would different @require and @except values in the same content model work?

BTW, I may well continue work on #627 (since it is causing problems for the WWP) first, even with convoluted testing procedures, before paying more attention to this. Thus I have not assigned this issue to a milestone.

sydb commented 9 months ago

issue_631_test_mult_any.zip

martindholmes commented 9 months ago

@sydb How does the ATOP PLODD processing do on this? Same problem?

sydb commented 9 months ago

Did not even look. But no reason to think it has same problem, as code base is entirely different. I have implemented a fix (using (3)), but have to adjust testing to match. I am still working on Test/; I suspect the change to Test2/ will be easy, just a tweak to cleanForDiff.xsl.

sydb commented 9 months ago

I have submitted a PR that I believe fixes this bug. It is like solution (3) proposed above, but I used concat('anyElement_',$spec/@ident,'_', [COUNT] ) as the @name of each pattern, where [COUNT] is the sequential number of this particular <anyElement> within the input.

I found this very hard to do (it took the better part of 2 days) in large part because I took some less-than-brilliant paths and detours, but also in large part because it turns out that code that generates <rng:ref>s that refer to these patterns occurs not only in odds/odd2relax.xsl, but also in odds/teiodds.xsl. Sigh.