TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
268 stars 88 forks source link

abstract model violation — NOT! #2497

Open sydb opened 8 months ago

sydb commented 8 months ago

[The following problem noticed by WWP encoder extraordinaire Grace O’Mara.]

The following (it seems to me) is a perfectly reasonable encoding:

<lg type="poem" subtype="stanzaic">
  <head>Ode II.</head>
  <head type="sub">The Mermaid.</head>
  <epigraph>
    <quote source="b:IT00863">
      <p>When at laſt they retired to reſt, <persName>Ajut</persName> went down to the beach, where
      <lb/>finding a fiſhing-boat, ſhe entered it without heſitation, and, telling thoſe … </p>
      <p>The fate of theſe lovers gave occaſion to various fictions and conjectures.
      …
      <lb/>her lover in the deſerts of the ſea.</p>
    </quote>
    <bibl><title ref="b:IT00863">Rambler</title>, N<g ref="#sup-o"/> 187.</bibl>
  </epigraph>
  <l>Blow on, ye death-fraught whirlwinds! blow,</l>
  <l>Around the rocks, and rifted caves;</l>

But the "abstractModel-structure-p-in-l-or-lg" constraint fires on each of those <p> elements complaining that “Lines may not contain higher-level structural elements”. That happens because those <p> elements are descendants of <lg>, but are not a descendant of <floatingText>, a child of <figure>, nor a child of <note>.[1]

Possible solutions, in my preferred order:

  1. Change the test so it is only checking that <p>s that are a descendant of <l>, not of <lg>.
  2. Add another clause to the test to check that the <p> is also not a descendant of <epigraph>.
  3. Add another clause to the test to check that the <p> is also not a child of <quote>.
  4. Require the <epigraph> be inside a <floatingText>.
  5. Tell Grace that finally, at the end of her stellar 5-year career encoding for the WWP, she has found something that TEI simply cannot represent properly.

I like (1) the best, because I am not sure why we are testing <lg> here. While I have a gnawing suspicion that it is at least partly my fault that we are testing <lg>, I am having trouble figuring out what element a paragraph could occur in which should be an abstract model violation when it occurs in an <lg> but not outside. That is, I am (once again) complaining that

        <lg>
          <head><app><rdg><p/></rdg></app></head>
          <l/>
        </lg>

is an abstract model violation, but

        <head><app><rdg><p/></rdg></app></head>
        <lg>
          <l/>
        </lg>

is not. Makes no sense to me.[2] Further, if we use solution (1), because we are soon to be required to provide a @context,[3] we may as well use

      <sch:rule context="tei:l//tei:p">
    <sch:assert test="ancestor::tei:floatingText | parent::tei:figure | parent::tei:note )">
          Abstract model violation: Lines may not …
      </sch:assert>

which is a whole lot simpler than what we have now.

Notes [1] The test, which is fired for each and every <p>, is (ancestor::tei:l or ancestor::tei:lg) and not( ancestor::tei:floatingText |parent::tei:figure |parent::tei:note ); the error message is delivered if the test is true. [2] Well, not exactly; my usual complaint is that they should both be abstract model violations. [3] I know we (TEI Technical Council) have pretty much decided to require an <sch:rule context="…"> in every <constraintSpec scheme="schematron">, but I could not find a ticket for that.

ebeshero commented 8 months ago

@sydb While I am usually pleased to question the abstract model, in this case I find myself perplexed on two points:

  1. Why is it reasonable to set a presumably introductory prose epigraph inside an <lg> designated to contain stanzaic poetry? Is the epigraph somehow attached to a particular cluster of lines within a poem? Grace's encoding suggests otherwise, that an <lg> is used to contain an entire poem. I am perplexed about that decision, because surely another container element would be more versatile for such a mixture of content. But it seems a deliberate decision to use <lg> this way for introductory material, presumably containing epigraph, some introductory lines, and I imagine some nested <lg> elements once you get into the stanzas. You seem perfectly happy with this.

  2. So is it time to reconsider the abstract model itself instead of elaborately undercutting it with more and more special-case Schematron rules based on specific contexts? Someone like me may come along in another ticket wanting to put a <note> in an <lg> for reasons that the note is attached to a specific stanza and happens to go on for paragraphs.

The abstract model itself may be the problem.

lb42 commented 8 months ago

I was also very puzzled by this use of lg type=poem.... Why isnt it a div or indeed a text? As to the alleged model violation -- surely the model should know that some elements (note, quote, floatingText, possibly app) are just private little worlds on their own and not fuss?

sydb commented 8 months ago

@ebeshero — In my current state of mind I am not sure I am qualified to answer question (1), if I ever am. But I suggest it is entirely irrelevant: the TEI Guidelines explicitly permit <epigraph> as a child of <lg>, and have done so since at least P2.

As for (2), while I think there are definitely problems with the abstract model, I do not think this is one of them. I think the abstract model says it is perfectly OK to have lg/epigraph/p, and our Schematron is just wrong. As noted above, my preferred solution would be to simplify the Schematron so it no longer checks lg//p as a possible violation, on the theory that any <p> descendant of an <lg> is legitimate unless it is illegitimate because it is also a descendant of something like a <head> or an <l> (so those are the things we need worry about).

@lb42 — Yes, I think the model should know that, and (fortunately) in the cases of <note>, <quote>, and <floatingText> I think it handles things mostly correctly. But not for <epigraph> when inside <lg>.

Note that <app> is another story, as it is the source of all this horror. The content model of <rdg> (or <lem>) allows <p>, <ab>, <lg>, or even <div>. The point of all this problematic Schematron is to warn an encoder when he has put one of those big elements inside a smaller thing (like a <head>, <p>, <l>, or even <w>) by enclosing it in a <rdg> (or <lem>). I.e., to prevent

<p>This is a
  <app>
    <rdg><div><ab>bad</ab></div></rdg>
    <rdg><div><p>terrible</p></div></rdg>
  </app>
  idea.
</p>