TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
282 stars 84 forks source link

tei:f/@fVal constraints FUBAR #1422

Closed sydb closed 8 years ago

sydb commented 8 years ago

The <constraintSpec> with an @ident of "fValConstraints" (which can be found in f.xml) is FUBAR. Here it is:

  <constraintSpec ident="fValConstraints" scheme="isoschematron">
    <constraint>
      <rule xmlns="http://purl.oclc.org/dsdl/schematron" context="tei:fVal">
        <assert test="not(tei:* and text)"> A feature value cannot
    contain both text and element content</assert>
      </rule>
      <rule xmlns="http://purl.oclc.org/dsdl/schematron" context="tei:fVal">
        <report test="count(tei:*)&gt;1"> A feature value can contain
    only one child element</report>
      </rule>
    </constraint>
  </constraintSpec>

Problems I see:

So I propose instead we use

  <constraintSpec ident="f-has-child-or-PCDATA" scheme="isoschematron">
    <constraint>
      <report test="tei:*  and  text()[normalize-space(.) ne '']">A feature value cannot contain both text and element content</report>
    </constraint>
  </constraintSpec>
  <constraintSpec ident="f-has-max-one-child-element" scheme="isoschematron">
    <constraint>
      <report test="count(tei:*) gt 1">A feature value can contain only one child element</report>
    </constraint>
  </constraintSpec>

Beyond all that, the <remarks> for <f> say

If the element is empty then a value must be supplied for the @fVal attribute.

I’m not convinced this is true, because although the Guidelines say “but the <f> element must have (or reference) some content.”,[3] they also say “The value of an empty <f> element which also lacks a @fVal attribute is understood to be …”[4].

But if it is true (that an <f> must have or reference something) we should be testing for it. Something like the following should do. (And in either case, the Guidelines need to be made consistent.)

  <constraintSpec ident="f-has-content-or-fVal" scheme="isoschematron">
    <constraint>
      <assert test="tei:*  or  text()[normalize-space(.) ne '']  or  normalize-space(@fVal) ne ''">The value expressed by a feature value specification should be expressed as content (of the <gi>f</gi>) or as a pointer (on <att>fVal</att>)</assert>
    </constraint>
  </constraintSpec>

Furthermore, if it is the case that an <f> should not have both content and a @fVal, then we should add the following.

  <constraintSpec ident="f-not-fVal-and-content" scheme="isoschematron">
    <constraint>
      <report test="( tei:* or text()[normalize-space(.) ne ''] )  and  normalize-space(@fVal) ne '')">The value expressed by a feature value specification should be expressed as content (of the <gi>f</gi>) or as a pointer (on <att>fVal</att>), not both</report>
    </constraint>
  </constraintSpec>

[1] See ISO 19757-3:2006, 3.20 “a rule-context is said to match an information item when that information item has not been matched by any lexically-previous rule context expressions in the same pattern and the information item is one of the information items that the query would specify”. Or better yet, just test it out yourself. For most of us that will be easier than reading the spec.

[2] Since in this case the items being counted are child elements, the other way to express this is just tei:*[2]. That is a lot shorter and somewhat sweeter, but I think the intent is less clear. Thoughts?

[3] In 18.2 Elementary Feature Structures and the Binary Feature Value

[4] In 18.9 Default Values

bansp commented 8 years ago

Syd, you're an angel. I'm going to give you some feedback from the ISO/implementation point of view once I move to this area in my project, which is going to be very soon. Thanks for scouting the way and for the fixes above!

lb42 commented 8 years ago

Thanks for spotting this. I think there once was (briefly) a <fVal> element, with children vColl, vAlt etc. but when it was removed on the grounds of redundancy clearly this schematron rule wasn't looked at closely enough. The @fVal attribute, as you note, is nothing to do with it : it is used to specify an additional value to be unified with that specified by the content, whatever the content may be. The confusion arose when it was decided to permit a feature containing just a text string with no indication of its type. I believe that the original intention was to say that <f> may contain : any one of the typed value elements, or a combination of them wrapped in e.g. <vColl>, or a non-empty string of characters. However, it is also possible (as 18.9 demonstrates) for a <f> element to be empty: so in fact these schematron rules are just plain wrong. Piotr may wish to correct me!

sydb commented 8 years ago

Fixed the constraintSpec 2016-03-10 in commit #0a529e0a373bd9e23187fefe223c15382a5fe1ca. Have not dealt with the issues as to whether or not an empty <f> has to have an @fVal or not.

hcayless commented 8 years ago

@sydb should add a schematron rule mandating that empty <f> have an @fVal.

sydb commented 8 years ago

Council mtg: SB to create Schematron rule to enforce “empty <f> has @fVal

sydb commented 8 years ago

Rule added in 195c97b0c3299d0b08d689f0e93813a8a945cbf5. I also needed to alter some prose from FS to match.

And then I looked at the Schematron rules that were already in place for <f> and asked myself I am wondering, why are you here?</voice>. While I’m proud of the nice Schematron we now have in there to enforce “A feature value cannot contain both text and element content” and “A feature value can contain only one child element”, I could not see any reason to have a loose content model and then enforce these restrictions with Schematron. Besides, they are not even true. (The content of an <f> can be textual, and if it’s textual may have characters outside of Unicode, so multiple <g> elements should be allowed.) Why not just create a content model in PureODD that enforces them correctly? So I did:

    <alternate minOccurs="1" maxOccurs="1">
      <macroRef key="macro.xtext"/>
      <classRef key="model.featureVal"/>
    </alternate>    

Note that I did not add the rule for “not both content and @fVal”, as Council did not address that issue at face-to-face. Given that the Guidelines state that it is permissible to have both (and that the value referenced by @fVal is to be unified with that contained as content of <f>), I am presuming we do not want such a rule.

lb42 commented 8 years ago

Looking at this again, and in particular at the passage which Syd has removed from the text, I am less confident that this is correct. This unification grammar is tricky stuff. I think the intention was (as originally stated) that an empty <lt;f/> should be legal, and have a particular meaning. The reasoning is probably that the same feature may be used more than once in a structure and you don't want to have to specify its possible values every time it does. I know this is counterintuitive (and looks weird in an XML context) but if you go and read the original ISO spec, I think you'll find that this is what was intended. The subsequent (minority) decision to allow textual content muddies the waters somewhat, as Syd rightly points out.

laurentromary commented 8 years ago

Agree with Lou I would avoid touching a model too swiftly when based on an underlying potentially elaborate theory.

Envoyé de mon iPhone

Le 15 mai 2016 à 11:01, Lou notifications@github.com a écrit :

Looking at this again, and in particular at the passage which Syd has removed from the text, I am less confident that this is correct. This unification grammar is tricky stuff. I think the intention was (as originally stated) that an empty <lt;f/> should be legal, and have a particular meaning. The reasoning is probably that the same feature may be used more than once in a structure and you don't want to have to specify its possible values every time it does. I know this is counterintuitive (and looks weird in an XML context) but if you go and read the original ISO spec, I think you'll find that this is what was intended. The subsequent (minority) decision to allow textual content muddies the waters somewhat, as Syd rightly points out.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

sydb commented 8 years ago

Just to make sure I’m clear on this … The scenario in the released Guidelines is contradictory with respect to whether an <f> element that does not have an @fVal attribute may also be empty.

In Providence (IIRC) we (the TEI Council) decided the tagdoc was correct, an empty and @fVal-less <f> makes no sense. Am I correct, @lb42 and @laurentromary, that you are suggesting that we figuratively flipped the coin the wrong way? (In which case the change should have been removing, or better still re-writing, the sentence “If the element is empty then a value must be supplied for the @fVal attribute.” in the tagdoc of <f>, yes?)

For the record, I’m OK with either solution. My only argument is with the blatant discrepancy.

laurentromary commented 8 years ago

The whole issue has to do with re-entrancy. Looking at http://www.tei-c.org/release/doc/tei-p5-doc/fr/html/FS.html#FSVAR more quietly, I see that the implementation does not imply an empty (I must say, I am not sure this is optimal, but don't want to break everything here). My suggestion here would be to try and avoid breaking backward compatibility unless someone with an FS background reads the chapter through to ensure that everything is coherent. Is there an emergency to do any kind of surgery here?

lb42 commented 8 years ago

Re entrance is part of the issue but I think the crucial point is that the possible range of values for a feature is not specified in the XML schema for features as would be the typical XML case. Instead it is specified in the feature system declaration. In which situation an empty f has the specific interpretation given in the text that syd was proposing to delete. So if anything needs changing its not the guidelines prose but the remark in the tagdoc, which misrepresents the intended behaviour.

laurentromary commented 8 years ago

Agree!

sydb commented 8 years ago

OK, so:

  1. the deleted and altered snippet in FS should get restored (see 1st file diff at https://github.com/TEIC/TEI/commit/195c97b0c3299d0b08d689f0e93813a8a945cbf5)
  2. the f-has-content-or-fVal constraint (which enforces “either content or @fVal”) should be removed
  3. the remarks should no longer say “If the element is empty then a value must be supplied for the @fVal attribute.” (But what, if anything, should it say instead?)
  4. The sentence “Similarly, the <fs> element may be empty, but the <f> element must have (or reference) some content.” should be deleted from prose section FSBI

I also plan to change the @minOccurs of the content model to "0". It will make absolutely no difference as to which documents are valid and which are not (because no text matches a <textNode> — about which see #1459), but the meaning — that empty content is allowed — is clearer.

This all sound reasonable?

sydb commented 8 years ago

Pending some thoughts by those who know more then me (@lb42, @bansp, and @laurentromary jump to mind), I have not implemented my summary of 4 days ago yet. I have, however, implemented @lb42’s suggestion that <g> not be allowed inside <f> as if this were a corrigible error in commit 46d4b34, push 62839e1.

lb42 commented 8 years ago

Briefly: 1: yes 2: yes 3: yes (it need say nothing) 4: no. please leave this sentence alone.

sydb commented 8 years ago

Ummm ... @lb42, I’m confused. You are suggesting that we want to

  1. Restore the snippet in FSBO that shows using an empty @fVal-less <f>, thus implying it is OK,
  2. Remove the constraint that says an empty @fVal-less <f> is an error, thus implying it is OK,
  3. Remove the remark that says an empty <f> must have an @fVal, thus implying it is OK,
  4. Leave the clause in FSBI that says it is NOT OK.

This seems like the kind of contradiction we were trying to clean up in the first place.

lb42 commented 8 years ago

The parenthetical phrase "or reference" is what does it for me.

sydb commented 8 years ago

But @fVal is how an empty <f> references content.[1] That is “the <f> element must have (or reference) some content” means “the <f> element must have content (or an @fVal)”. So I’m still suggesting the sentence be removed from FSBI as per (4), above.


[1] Thus the definition of @fVal: “references any element which can be used to represent the value of a feature”.

lb42 commented 8 years ago

I guess I am just too elliptical. The point is that an empty <f> can "reference" its intended values by means of declarations in the feature system. Perhaps recasting the sentence as "must specify its value either directly as content or by means of the @fVal attribute, or implicitly by reference to a feature system declaration" would help. Or just delete the sentence as you suggest.

sydb commented 8 years ago

Finally resolved (I hope) in a97af87.