TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

Invalid specification document for egXML #2380

Open dmj opened 1 year ago

dmj commented 1 year ago

I'm pretty certain that the specification document for egXML located at https://github.com/TEIC/TEI/blob/dev/P5/Source/Specs/egXML.xml is not valid. Two of the examples [1,2] read like this:

  <exemplum xml:lang="en">
    <egXML xmlns="http://www.tei-c.org/ns/Examples" xml:lang="und" valid="feasible" source="#UND">
      <egXML valid="feasible" source="#UND">
       <text>
       ...

From my understanding using egXML as a child or descendant of egXML is not allowed.

The content model definition looks as follows:

    <alternate minOccurs="0" maxOccurs="unbounded">
      <textNode/>
      <anyElement/>
    </alternate>

The anyElement lacks an @‍except attribute and falls back to the @‍defaultExceptions of schemaSpec which defaults to http://www.tei-c.org/ns/1.0 teix:egXML.

Thus egXML is forbidden as a descendant of egXML.

[1] https://github.com/TEIC/TEI/blob/9a968d3c856393dbb7c72ada2e32530a9e79143d/P5/Source/Specs/egXML.xml#L67 [2] https://github.com/TEIC/TEI/blob/9a968d3c856393dbb7c72ada2e32530a9e79143d/P5/Source/Specs/egXML.xml#L92

sydb commented 1 year ago

Well, yes and no. It is absolutely the case that TEI P5 is not valid against TEI P5 vanilla (i.e. tei_all) for the very reason you point out. But it has always been the case (since P2, I think) that the Guidelines have been validated against their own customization, not against vanilla. This was done, at least in part, to drive home the point that customization is a conformant, expected, and desirable thing to do.

The Guidelines are valid against the customization named “p5odds”. (I really don’t like that name, but at the time had no better suggestion, and to be honest, I still don’t.) It makes quite a few modifications to P5, most of them quite reasonable — clearly important for the source of the Guidelines, but not something that other projects should be restricted to. For example, p5odds adds a controlled vocabulary to @type of <idno>, and requires that if a specification has a <gloss> child, then it must have a <gloss> child with xml:lang="en".

The particular constraint in question, though, is the “no <egXML> in <egXML>” constraint which exists in tei_all, but is removed in p5odds. I wish I could explain to you why this constraint was removed in p5odds, rather than never existing in the Guidelines in the first place. I am afraid I cannot, and there is even a comment complaining about it in p5odds.odd:

This definition of overrides that of P5 which, for some reason I don't grok, does not allow a descendant. — Syd, 2018-06-13

IIRC at the time I was not able to garner any enthusiasm for removing the constraint in P5 itself. But I really hope that either a. someone can explain to us (preferably here on this ticket) why it is a good idea to prevent <egXML> from occuring in <egXML>; or b. other folks support the idea of allowing <egXML> in <egXML> in tei_all (again, preferably here on this ticket).

martindholmes commented 1 year ago

I think the simple answer is that it's a bit of a nightmare to process nested egXMLs, especially since if you allow them, then you have to figure out what it means for them to be three levels deep, or four, or five. You could of course have a constraint that allows them to be nested only one level deep, purely for the purpose of exemplifying egXML usage, but even in that case you have the problem that you have to somehow process the interior egXML as though it were part of the example. It's doable, but it's not intuitive or elegant. Generally it's a lot simpler to exemplify the use of egXML with a CDATA island or something like that.

sydb commented 1 year ago

Sorry, not sure I understand the problem. (But I am quite tired, so I may be missing the obvious. :-) It never even occurred to me to worry about the level of nesting, as I guess I presume that the only <egXML> that is “processed” (as opposed to “becomes part of the example”) is the one that has no ancestor <egXML>.

duncdrum commented 1 year ago

I might be missing the obvious, but it seems to me that the need to document the use of <egXML> is rather specific to the Guidelines, and therefore a good match for a Guidelines specific customization.

Are there really sufficiently many projects that require nested <egXML> to document their ODDs, to make this part of tei_all?

dmj commented 1 year ago

All right, I learned something new.

Allowing egXML as a descendant of egXML in p5odds dates back to commit 137761825c. I don't quite understand the reasoning given in the commit.

Anyway: This is a not-a-bug, then.

sydb commented 1 year ago

Really good question, @duncdrum. My guess is that people using ODD to customize TEI have very little need to exemplify <egXML>. But people using ODD to write their own markup languages from scratch are a different story. (On the other hand, they would have to be writing a markup language similar enough to TEI that it had a <teix:egXML> element, so that may be an empty set. :-)

@dmj — I don’t see any reasoning given in the commit, but I may not be looking in the right place. (All I see is “make meta test more challenging”.)

martindholmes commented 3 months ago

In preparation for today's Council meeting @sydb asked that "someone would explain why it is a good idea to prevent <egXML> from occurring in <egXML>". I think my comment from 2022-12-13 still holds: it's virtually impossible to process such a thing into useful rendered output. Also, it's not clear what it could possibly mean, because the main mechanism of <egXML> is to place descendant items into an example namespace so that they are clearly distinct for processing purposes from the original source elements they're exemplified, and thus can be processed in a different way. Given this, you cannot use a nested <egXML> to exemplify the use of <egXML> because the mechanism fails to remove the nested <egXML> from its original namespace; in other words, it's just another instance of the real thing rather than an example of it. I rest my case.