TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
282 stars 84 forks source link

Schematron constraint for ab is too crude #1988

Closed martindholmes closed 3 years ago

martindholmes commented 4 years ago

Line 52 of ab.xml has this Schematron constraint:

<constraintSpec ident="abstractModel-structure-l" scheme="schematron">
    <constraint>
      <report xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:tei="http://www.tei-c.org/ns/1.0" test="ancestor::tei:l or ancestor::tei:lg">
        Abstract model violation: Lines may not contain higher-level divisions such as p or ab.
      </report>
    </constraint>
  </constraintSpec>

However, the content models for <lg> and <l> both allow a child <figure>, and <figure> quite reasonably allows child <ab> and child <p>. So I think the constraint might be modified to allow for this:

<constraintSpec ident="abstractModel-structure-l" scheme="schematron">
    <constraint>
      <report xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:tei="http://www.tei-c.org/ns/1.0" test="(ancestor::tei:l or ancestor::tei:lg) and not(parent::figure)">
        Abstract model violation: Lines may not contain higher-level divisions such as p or ab.
      </report>
    </constraint>
  </constraintSpec>

There is a similar but already slightly-qualified constraint in p.xml that would also have to be modified.

lb42 commented 4 years ago

But should <l> be allowed to contain <figure> ? I am finding it hard to imagine a real life case where this would make sense (I can -- just about -- see a case for <lg> though it's stretching things)

jamescummings commented 4 years ago

@lb42 I was going to say "what about a poem constructed from images?" But then predict you'd say "Those are graphics not figures." And I'd say "Oh, yeah." So thought I'd shortcut the conversation in case anyone else was going to suggest that. ;-)

lb42 commented 4 years ago

@jamescummings you know me too well.

PFSchaffner commented 4 years ago

Rebus poetry is of course common. Many rebus images contain text (some consist entirely of text). Question is whether that text can or should be captured in the desc element within graphic, or whether as transcriptional they should be captured in some text-bearing element within figure. ... I'm sure there are also some odd-ball outlier pieces of verse that incorporate an unspeakable illustration of some kind within them. "Look at this (figure) / what do you see? / I see a (figure) / looking back at me." Or something equally horrible. Once you've seen ledgers (tables) and captioned illustrations within sp, you'll not rule out anything.

lb42 commented 4 years ago

Yes, but as previously noted, I would suggest that if something is a poem, it should be encoded as such using <l> elements, and if parts of a line are represented graphically, rather than textually, the element <graphic> is available. See attached example of a well known case. heydiddle

PFSchaffner commented 4 years ago

Yeah, I know: I was positing the case in which the graphic elements themselves contain text too.

PFSchaffner commented 4 years ago

rebus

ebeshero commented 4 years ago

Hmm. @PFSchaffner ‘s example presents graphically a nice case of when we’d want a <figure> in the line, doesn’t it?

PFSchaffner commented 4 years ago

Contrived, I admit, but yes, that was the point. ... and pretty good for something done while fumbling through breakfast.

ebeshero commented 4 years ago

Call it “born digital poetry” then! :-)

ebeshero commented 4 years ago

I don’t know, @jamescummings and @lb42. Looking at our examples in the Guidelines for <graphic>, we seem pretty committed to wrapping it in a <figure> anyway for holding associated metadata. I think we are just fine with <figure> and we definitely should not disallow it. But should we be permitting <graphic> by itself? I am not so sure of that.

https://tei-c.org/release/doc/tei-p5-doc/en/html/examples-graphic.html

ebeshero commented 4 years ago

What I am noticing in the examples is the contraption of <figdesc> with <graphic> inside <figure>, which conveniently associates text and image. What other ways do we have to make such associations?

This reminds me of the best practice urged by the w3c for HTML in providing text descriptors of images—a bit of text metadata would be expected to support screen readers. Perhaps for images that function semantically in a line of verse, we would expect some descriptor? This isn’t the same case as using <graphic> to accompany a transcription of a page surface.

martindholmes commented 4 years ago

@lb42 Here's an example of a <figure> in the middle of an <lg>:

https://hcmc.uvic.ca/~vicpoems/page_images/goodwords/11/goodwords_11_465_areverieandasong.jpg

This is very common, and if a line were to wrap in a particular way, it's easy to imagine a case where it could appear within <l> as well. I don't have examples to hand of this, because of course our encoders can't (yet) encode any of these figures in the right places because they have descendant elements which are not allowed due to the Schematron rule.

martindholmes commented 4 years ago

@ebeshero We typically use <figDesc> for an encoder's/editor's description of the figure, and other descendant elements for any caption that appears. Incidentally, a caption may contain a quote of one or more poetic lines, so you'll end up with <lg> or <l> as a descendant of the poem it's all embedded in. In this sense, <figure> is a bit like <floatingText>.

lb42 commented 4 years ago

@ebeshero There are plenty of ways of associating metadata with a graphic (I suggested using a nested desc in my example): figDesc provides information about a figure, which isn't the same thing as a graphic. I think of <figure> as being a block level component of a page which may for example float about without affecting the sense (this wouldn't work for the rebus examples by the way.)

@martindholmes Your example supports my belief that it makes sense to permit figure between <lg>s, but not within <l>s.

martindholmes commented 4 years ago

@lb42 figure is already allowed in both contexts, though. This ticket is about the Schematron that undermines that existing functionality.

lb42 commented 4 years ago

Indeed it is. I am arguing that you should not construe this prohibition as "undermining" in the case of a figure appearing within a <l> but rather as a useful suggestion that you probably ought not to be encoding it that way.

martindholmes commented 4 years ago

@lb42 That really doesn't make sense to me. The content model of <l> allows <figure> inside it. The Schematron rule that complains about legal descendant elements inside <figure> is not there in order to warn people that putting <figure> inside <l> is wrong; it's just a poorly-tuned rule that happens to trigger when it shouldn't. If <l> should not contain <figure>, that's a completely different ticket that would need quite a bit of discussion; it would presumably also have to consider whether <notatedMusic>, 'or '<table> should be allowed in <l> as well. Meanwhile, this Schematron rule (and the one for <p>) just need to be tweaked (as the one for <p> already has been for <note>) to prevent them from triggering when they shouldn't.

ebeshero commented 4 years ago

@martindholmes @lb42 I am pleased that <figure> is permitted within <l>. We should not be confusing presentation with semantics (it doesn't matter to me at all that <figure> gets automatically transformed by some people as block-level element). If you have a special case such as the ones we've envisioned here, <figure> gives us more capacity to convey metadata as well as text descriptors associated with the graphic, as in @PFSchaffner 's creative example. When we have these unusual situations, we don't need to be making it difficult for people to apply the semantics of <figure>.

As for the Schematron, which is what this ticket is about, I agree with @martindholmes that we had better modify it, and I like this solution. Just don't forget to modify the Schematron report message accordingly. I'd suggest this phrasing:

<constraintSpec ident="abstractModel-structure-l" scheme="schematron">
    <constraint>
      <report xmlns="http://purl.oclc.org/dsdl/schematron" xmlns:tei="http://www.tei-c.org/ns/1.0" test="(ancestor::tei:l or ancestor::tei:lg) and not(parent::figure)">
        Abstract model violation: Lines may not contain higher-level divisions such as p or ab, unless ab is a descendant of figure.
      </report>
    </constraint>
  </constraintSpec>
martindholmes commented 4 years ago

@ebeshero "...descendant of figure or note". I guess we should also look for other contexts which act in a similar floatingText-like manner.

bansp commented 4 years ago

I begin with a side question that must have popped up in Council discussions and an agreed reply has probably been given in one of the working papers (but I just don't know it): wouldn't it be good practice to always explicitly specify the context in Schematron rules that are part of the Guidelines? The context is surely implied by either the individual Source/.xml fragments or by traversing the tree to the containing Spec element inside a compiled ODD, but I'm asking about good practice as opposed to quick practice ;-)

I asked myself that side question when I looked at the latest suggestion by Elisa closer, to make sure that the doubt that I'm about to share now is real. And the doubt/uneasiness is that I find it strange that part of the validating machinery does its job also over <ab>s that have no chance to be mistakenly contained within <l> or <lg> elements, because they e.g. come from a customization that skips the latter two altogether. Abstracting away from what should or should not be contained by <l(g)>, I'd expect that information to be encoded within <l> and <lg> themselves (and preferably in terms of model classes, but individual elements might be excused if model classes can't be handled by Sch rules), both for architectural reasons and for didactic/documentation reasons, rather than in the individual elements that just happen to be members of the model classes that are banned when <l(g)> enter play. So this question is about the good/best practice in schema modelling.

I realise that my comment can be seen as sidetracking, so please feel welcome to ignore it and tell me to RTFM (but with a pointer to a working paper or something equivalent, please :-) )

Finally, something directly related to a preceding comment: the fragment "higher-level divisions such as p or ab, unless ab is a descendant of figure" could perhaps benefit from <sch:name> in the bolded fragments.

martindholmes commented 4 years ago

@bansp I've never been particularly comfortable with the practice of placing constraints inside <elementSpecs> and having their context derived from those <elementSpecs> partly because it works only at the element level; you would expect to be able to do the same thing in an <attDef> but you can't because it doesn't work. So wherever Schematron rules appear in the <schemaSpec> (for convenience, organization, personal preference or whatever), I think it would be good practice to define the context.

But I think the overall issue is much simpler than it appears from the discussion: the TEI as it is currently designed does not allow elements to have different content models based on context, so a <figure> is a <figure> is a <figure>. And if you say that a <figure> can have a <p> or an <ab> as a child or a descendant, then that applies wherever it appears. You can't come along later with Schematron and say that this particular figure (which is identical in every respect to all the other figures in your document except for the fact that it occurs within an <lg>) must have a different context model. And you certainly can't claim that this ill-tuned Schematron is some sort of mystical signal that your original decision to allow <figure> in <lg> or <l> must have been wrong.

gimsieke commented 4 years ago

This conversation is too much focused on figures. Consider note as an element that is also allowed within l and lg and that may contain ab or p. The Schematron rule certainly shouldn’t fire for l/note/ab, should it? There are a couple of elements that establish a new scope below which p and ab are perfectly acceptable. In this regard I concur with @martindholmes, but regarding the question whether context-dependent models should be allowed in TEI I think they should be allowed and it’s a design flaw that ODD doesn’t permit context-dependent models natively. Therefore Schematron is considered by many the last and only resort for establishing these constraints. But this particular Schematron should be fixed.

martindholmes commented 4 years ago

@gimsieke I think most of us would agree that TEI should support context-specific content models, and you're right of course to point out that the issue is something to do with the transition from one scope to another; that was what I was trying to get at with my analogy with <floatingText>. Elements such as <note>, <floatingText> and <figure> constitute little embedded worlds in which all the scoping rules of their containers are effectively extinguished.

martindholmes commented 4 years ago

Ping! Any progress on this? I have a project with half-encoded files waiting on a decision.

ebeshero commented 4 years ago

VF2F: Council greenlights @ebeshero and @lujessica to proceed and modify the Schematron rule as indicated.

ebeshero commented 3 years ago

I think I've fixed this in a branch. Because this is modifying a Schematron constraint to make it more permissive, we basically want to test to make sure <ab> is permitted within <figure>, <note>, and <floatingText> when these are in <l>. Here's a file I used to test on with p5odds.rng:

<!--<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>-->
<?xml-model href="p5odds.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="p5odds.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Title</title>
         </titleStmt>
         <publicationStmt>
            <p>Publication Information</p>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the source</p>
         </sourceDesc>
      </fileDesc>
  </teiHeader>
  <text>
      <body>
         <p>Some text here.</p>

         <lg>
            <l>I think that I shall never see <note><ab>Sight of TEI elements is significant in this little poem.</ab></note></l>
            <l>A <floatingText><body><ab>floating text</ab></body></floatingText> inside this <figure><ab>tree</ab></figure>.</l>
         </lg>
      </body>
  </text>
</TEI>

Testing in my branch after building P5 (with Jenkins via Docker) works. With TEI-All, every use of <ab> raises the abstract model violation error that prompted this ticket. With a newly generated p5odds.rng in the branch, the <ab> elements here don't raise errors, so I think we're in the clear.

sydb commented 3 years ago

Oh dear. Sorry to say I missed all of the conversation back on or about 18 April, about which I would have had a lot to say. Sigh. Very quickly: