TEIC / Stylesheets

TEI XSL Stylesheets
232 stars 124 forks source link

Duplicate elementSpecs with the same `@ident` may trigger a build error when creating documentation #645

Closed martindholmes closed 9 months ago

martindholmes commented 10 months ago

I came across a build error when building documentation from an ODD file which happened to have two <elementSpec>s with the same @ident -- I had done this by accident, but I have seen it done on purpose as a way of organizing an ODD. The error is:

stylesheet/common/functions.xsl:1242:9: Fatal Error! A sequence of more than one item is not allowed as the first operand of 'gt' (@versionDate="2005-01-14", @versionDate="2005-01-14")

triggered by this:

<xsl:sequence select="tei:gloss[@xml:lang eq 'en']/@versionDate gt  tei:gloss[ @xml:lang eq $lang]/@versionDate
             or tei:desc[@xml:lang='en' and  not( @type eq 'deprecationInfo' )]/@versionDate
                gt
                tei:desc[@xml:lang eq $lang and  not( @type eq 'deprecationInfo' )]/@versionDate" />

Personally I think we should make it an error to have to <*Spec> elements with the same @ident, but if not, this error will have to be worked around.

lb42 commented 10 months ago

I can confirm this error: it's very annoying since the error message gives no hint as to what has caused it.

In my case, I had a very large ODD containing many classSpec elements, modifying various attributes, using mode=change. This used to be perfectly legal: reconciling multiple declarations is part of the job of odd2odd. I am pretty sure I am not the only person to find that old ODDs have suddenly broken mysteriously.

Why is something like the following suddenly a problem?

<classSpec ident="foo" mode="change">
<attList><attDef ident="wibble" mode="delete"/>
</attList></classSpec>

<classSpec ident="foo" mode="change">
<attList><attDef ident="wobble" mode="change"
<exemplum><!-- a usage example for foo/@wobble-->
</exemplum>
</attList>
</classSpec>

I disagree that multiple specifications for the same @ident are necessarily an error! If it is the whole ODD modification mechanism is broken!

raffazizzi commented 10 months ago

@lb42, tl;dr this is a bug and not intentional.

This has to do with existing strict checks about multiple <desc> or <gloss> elements in the same language. Sometimes we need them, particularly when it comes to deprecation warnings. These errors are coming from our flawed attempt at making this work. The test with gt is clearly failing when specs get combined, so we need to find another strategy.

sydb commented 10 months ago

I think overall @raffazizzi is correct, this is a pretty specific bug. That said, IIRC there are several parts of the Stylesheets that have problems with multiple *Spec elements in the same file with the same @ident.

Anyway, looking again at the problematic test (with whitespace adjusted somewhat):

<xsl:sequence
   select="tei:gloss[ @xml:lang eq 'en']/@versionDate  gt  tei:gloss[ @xml:lang eq $lang ]/@versionDate
           or tei:desc [ @xml:lang='en'  and  not( @type eq 'deprecationInfo')]/@versionDate
              gt
              tei:desc[ @xml:lang eq $lang  and  not( @type eq 'deprecationInfo')]/@versionDate
          "/>

The sequence being created establishes the boolean to be returned by tei:descOrGlossOutOfDate(). It is handed a single context node, and returns a single boolean. It is only testing <gloss> and <desc> children of that context node. So I am a little confused about why having multiple Spec elements with the same @ident causes this problem, unless those Spec elements also have <gloss> or <desc> elements (that are not deprecation information) that have the same @xml:lang. (In which case, when they are combined we would have multiple <gloss> or <desc> children of the combined *Spec element that have the same @xml:lang.)

I have said elsewhere that maybe that gt operator should just be a > operator, but now as then we would need to think through and test what the consequences are.


P.S. Not relevant to this issue, but comes up because it effects the same code: @joeytakeda recommended not(@type), and I went with not(@type eq 'deprecationInfo'). I don’t think either is correct. We should be examining them by type. Something like

  <xsl:function name="tei:descOrGlossOutOfDate" as="xs:boolean">
    <xsl:param name="context"/>
    <xsl:variable name="lang" select="tei:generateDocumentationLang(.)[1]"/>
    <xsl:variable name="test_results" as="xs:boolean+">
      <!-- first test <gloss> children without @type -->
      <xsl:sequence select="$context/tei:gloss[ not(@type) ][ @xml:lang eq  'en' ]/@versionDate
                        gt  $context/tei:gloss[ not(@type) ][ @xml:lang eq $lang ]/@versionDate"/>
      <!-- second test <desc> children wwithout @type -->
      <xsl:sequence select="$context/tei:desc[ not(@type) ][ @xml:lang eq  'en' ]/@versionDate
                        gt  $context/tei:desc[ not(@type) ][ @xml:lang eq $lang ]/@versionDate"/>
      <!-- next test <gloss> children with @type -->
      <xsl:for-each select="distinct-values( $context/tei:gloss/@type )">
        <xsl:sequence select="$context/tei:gloss[ @type eq . ][ @xml:lang eq  'en' ]/@versionDate
                          gt  $context/tei:gloss[ @type eq . ][ @xml:lang eq $lang ]/@versionDate"/>
      </xsl:for-each>
      <!-- last teset <desc> children with @type -->
      <xsl:for-each select="distinct-values( $context/tei:desc/@type )">
        <xsl:sequence select="$context/tei:desc[ @type eq . ][ @xml:lang eq  'en' ]/@versionDate
                          gt  $context/tei:desc[ @type eq . ][ @xml:lang eq $lang ]/@versionDate"/>
      </xsl:for-each>
    </xsl:variable>
    <!-- if any one of the above is true, then something is amiss, return true. -->
    <xsl:sequence select="$test_results = true()"/>
  </xsl:function>

(Untested.) As a side effect, while that does make the function longer, each individual @select has at most 1 comparison operator, and thus is easier to read.

I am also really curious about whether the tei:generateDocumentationLang(.) on the 2nd line of the function should really be tei:generateDocumentationLang( $context ).

martindholmes commented 10 months ago

I don't know whether it should be an error to have multiple Spec elements with the same @ident, but surely it should be an error to have e.g. multiple <gloss> elements with the same @xml:lang spread across multiple Spec elements with the same @ident? I think it's reasonable for the Stylesheets to make the assumption that all instances of the Spec elements with the same @ident can (and perhaps should, in preprocessing) be combined to create a single coherent Spec element without duplication or ambiguity?

I think it's important to clarify that there is a distinction between a Spec element in a customization ODD which is being applied as a modification to a Spec element in a source ODD upstream in the ODD chain, and an instance of a *Spec element with the same @ident as another in the same file, where (I assume) the intention is that they work together somehow to modify the upstream element.

lb42 commented 10 months ago

Interesting thought experiment: its not hard to imagine a use case for making an odd in which each element has multiple descs for the same lang. Suppose your downstream app will choose between them depending on whether you are generating the student intro doc or the professional technical manual. Does that mean these have to use different values for @xml:lang to avoid falling foul of this prohibition?

Also when you say "same file" you mean (i hope) "same document"

raffazizzi commented 10 months ago

@martindholmes:

but surely it should be an error to have e.g. multiple <gloss> elements with the same @xml:lang spread across multiple *Spec elements with the same @ident?

I agree, though it depends on @mode right?

Essentially we need to be able to support this (note the multiple <desc>s in English):

<attDef ident="calendar" usage="opt" validUntil="2024-11-11">
        <desc type="deprecationInfo" versionDate="2023-05-11" xml:lang="en">The <att>calendar</att> attribute will be removed from this element
            as it will only be allowed on elements that represent dates with their content. This is because the <att>calendar</att> attribute
            (unlike <att>datingMethod</att> defined in
            <ident type="class">att.datable.custom</ident>) defines the calendar system of the date
            in the original material defined by the parent element, <emph>not</emph> the calendar to
            which the date is normalized.</desc>
        <desc versionDate="2021-04-26" xml:lang="en">indicates one or more systems or calendars to which the
            date represented by the content of this element belongs.</desc>
<!-- .... -->
</attDef>

But either our test is imprecise, or there's something going on when compiling ODDs that's triggering the condition.

martindholmes commented 10 months ago

but surely it should be an error to have e.g. multiple elements with the same @xml:lang spread across multiple *Spec elements with the same @ident?

I agree, though it depends on @mode right?

Surely not -- there are three things you want to do:

  1. Add a gloss for a language not yet there: @mode="add"
  2. Change an existing gloss: @mode="change"
  3. Delete an existing gloss: @mode="delete"

In all of those cases, you only need to have a single element with that @xml:lang attribute in a given customization. (This assumes that you don't try to customize your own customization within a single customization -- by e.g. adding a <gloss> in one elementSpec and then overriding it in another. Which is just silly.)

The only exception is the deprecationInfo desc, which is a special case (and perhaps we should have done that with a <remark> element, actually).

raffazizzi commented 10 months ago

Seconded on using <remark> instead... I wonder if we could still do that.

sydb commented 10 months ago

Well, we can change the expression of deprecation information from <desc> to <remarks>, but that means allowing <remarks> as a child of at least <constraintSpec>, and maybe some or all of <model>, <modelGrp>, <modelSequence>, and <paramSpec>.

lb42 commented 10 months ago

You could also use note which is globally available. And semantically more appropriate methinks.

martindholmes commented 10 months ago

I've just been hit by this error again, this time in an ODD which doesn't have any duplicated elementSpecs. Can't figure out what's causing it this time.

sydb commented 10 months ago

Care to post (or send to me separately) the ODD?

martindholmes commented 9 months ago

LOI.odd.zip

That's the file. Transforming with ODD-to-rng in Oxygen (with the bleeding-edge plugin) gives me a duplicate @rend attribute; the core error seems to be duplication of lots of attribute classes on elements, like this:

        <ref name="att.global.attributes"/>
         <ref name="att.datable.attributes"/>
         <ref name="att.editLike.attributes"/>
         <ref name="att.personal.attributes"/>
         <ref name="att.typed.attributes"/>
         <ref name="att.global.attributes"/>
         <ref name="att.datable.attributes"/>
         <ref name="att.editLike.attributes"/>
         <ref name="att.personal.attributes"/>
         <ref name="att.typed.attributes"/>

This was working fine until the last few weeks. I haven't had a chance to dig very deeply into it.

sydb commented 9 months ago

I just tried the > instead of gt quick fix, and it seems to work, in that it passes all tests in Test2/ and all in Test/ (except PDF which always fails on my system).

I will check this in as a branch. Do we want a PR already, or do we want to test it on known problematic cases, first?

ebeshero commented 9 months ago

@sydb Presumably you mean > , not < (!) And yes, I think we had better test on problematic cases first...

sydb commented 9 months ago

Wow. Never occurred to me this (what LOI.odd does — 2 <elementSpec mode="change"> for same element) would work. And it sure looks like an error to me — it sure looks like the “orgName” on line 4963 should be “placeName”. But it did work. I generated HTML documentation and Relax NG schema from

In all cases the invalidity is duplicate attributess of <orgName>, as @martindholmes noted above.

So it is quite clear to me that these are two entirely separate problems, and that #654 fixes the build failure due to multiple <gloss> or <desc> with same @xml:lang problem (at least in this case and all cases presented in Exemplars/), but does not fix the duplicate attrs defined in Relax NG problem, which appears to have been introduced quite recently, and should be a separate ticket.

martindholmes commented 9 months ago

I didn't realize that ODD had duplicate elementSpecs. It's had a long history and lots of editors. My mistake there, although it's not invalid and there'll be lots more out there like it.