Add contextually-variant content models to ODD

jamescummings commented 6 years ago

We should add co-occurrence constraints [edit: Contextually-variant content models, really] properly to the ODD language.

This should happen for content models, attribute value lists, classSpecs, dataTypes etc.

Perhaps with a 'context' attribute or some such thing. This is more complicated than it might sound in doing it properly in ODD, but even more so in thinking about how our processing will deal with it.

(Based on discussion with @sydb in a breakout group at Council Face to Face Cologne, 2018-02-24)

peterstadler commented 6 years ago

Yes, solving the Durand conundrum part 2: replacing schematron with PureODD!

lb42 commented 6 years ago

Co-occurrence constraints are a specific form of additional constraint on content, which you might well implement by means of schematron. A typical co-occurrence constraint might say that if an attribute has some value, then some other attribute should not be permitted, for example. Or it might say that if this element is present in a content model, then that other element should not be. The 'context' attribute proposed (but not implemented) in pure ODD originally is rather different: it was intended to help with the more frequent problem of wanting elements to have different content models in different contexts: for example, I might want to require the <p> elements in my <text> to be segmented into <s>s, but not those in my <teiHeader. This too can be implemented in schematron. And there are lots of other things that schematron can do which are neither contextual nor co-occurrence constraints. So what exactly is the proposal here? ODD already permits the inclusion of embedded schematron rules of any kind. Is the proposal to re-express some or all of the semantics of schematron in ODD? That would certainly be tidier, though presumably it would still have to be translated into something else (presumably schematron) to be of any practical use.

sydb commented 6 years ago

A few thoughts, which I can't really afford to go into in detail right now, but should jot down lest I forget them:

You might implement a co-occurrence constraint by means of Schematron, but that is not what @jamescummings and I had in mind. The point is not just to be told that <duck> is not allowed in teiHeader//p after the fact, but to not have duck in the list of pop-ups when you insert an element inside teiHeader//p in the first place.
Schematron can express co-occurrence constraints, and can do a lot more. I am not in favor of trying to re-write Schematron in ODD, myself; at least not at this time.
The @context attribute (as I imagine it) would be somewhat more limited than full-fledged co-occurrence constraints (themselves more limited than full Schematron), but could (as @lb42 points out) be designed to handle the most common TEI cases. That may well be the way to go.

Even implementing the @context attribute is going to be a lot of work. And worth noting that, because ODD still supports use of RELAX NG content models in macros, co-occurrence constraints can be hacked. (Bad idea, some would say.)

lb42 commented 6 years ago

OK, but don't call it "co-occurrence constraint" then. The requirement is for contextually-variant content models. Which is what the @context attribute was invented for. See Rahtz & Burnard 2013.

lb42 commented 6 years ago

Here's another use case (a real one). ELTeC wants to use a constrained <bibl> in the sourceDoc of its headers -- requiring certain members of model.biblLike to be present -- but it doesn't want <bibl> anywhere else to be constrained.

sydb commented 6 years ago

(In reply to @lb42 of 14:54Z, i.e. ~15 mins ago.)

Point taken. But to be fair, contextually-variant other things would be useful, too. Do you mean your XML London talk?

martindholmes commented 6 years ago

I really think we're seeing the beginning of P6 here. The combination of a hierarchical class system and an extended Pure ODD that can encompass context-dependent context models is a big enough change in my view to make it a start-from-scratch project, and it would be much more straightforward to start with a relatively clean slate than try to graft all these things onto the existing system while retaining backward compatibility.

jamescummings commented 6 years ago

I think I might have a higher bar to what constitutes moving to P6 than @martindholmes.

@lb42: You are right that I was conflating co-occurrence constraints and contextually-variant content models. However, my thinking was that if we are solving this for content models then why would we not be using the same mechanism for attribute valLists etc. I agree with all your use cases that this is a good thing to be done and I think is part of our updating of ODD to be pure and complete.

I don't want to replace schematron (though I recognise one can do many of these things by schematron rules). It isn't about giving users warnings or errors that the thing they are doing is crossing some line in the sand. It is about them having or not having the choices available to them or restrictions already placed on them. That could be on an ODD-level of in the header requiring several elements where elsewhere it doesn't (context-sensitive content models) but equally could be on the document level (say an attribute not being available on an element in a specific location because of the value entered into another attribute elsewhere in the document). The latter is also a context-sensitive content model (if you accept that attributes and their values are part of element's content).

An example might be that underneath <text> your <bibl> elements must have a @source attribute if and only if you have a //back/div/@type='bibliography'. Yes that can easily be done in schematron but that doesn't really document what your ODD is doing (just adds it as a constraint). Really what ODD should be doing is documenting this not through schematron but through ODD itself. We would want to record that element <bibl> is different depending on some co-occurrence that happens to be based on attribute values rather than elements. (i.e. rather than it happening to appear in the header). To me this is the same kind of constraint and should be modeled using the same forms of mechanism in ODD. It is about the documentation of our intent of the constraint in the meta-schema of ODD rather than the implementation of it. I'm more than happy for processing to turn this into schematron as a much more straightforward way to say it but the ODD language should be able to document constraints that change the content model of elements (which for me includes whether an attribute is required or not) without needing to resort to schematron (even if that is what is generated from it).

raffazizzi commented 5 years ago

F2F subgroup is in favor of creating a system for co-occurence constraints in ODD. Would @jamescummings produce a more fully fledged proposal?

martinascholger commented 5 years ago

Greenlighted for @jamescummings to develop a proposal, with further discussion.

EsGeh commented 4 years ago

There seems some overlap with problems we ran into when trying to compile Relax NG to (equivalent) ODD. Therefore I'd like to share my thoughts: Relax NG has a similar structure to a context-free grammar. Here different content can be specified for a single (XML-) element for every occurrence on the right hand side of rules.

ab =
  element ab {
    text
  }
chapterabstract =
   element ab {
     (text | markup | foreign | ref | bibref)*
   }

...
divX = element div { ab }
divY = element div { chapterabstract }
...

(here different content models are specified for ab in different rules)

Current ODD allows for exactly one single content model per element. In order to enforce semantically equivalent restrictions in ODD one would have to check the content manually using multiple schematron rules which, depending on the "context" (e.g. the parent xml element), check the content using XPath expressions.

<elementSpec ident="ab" mode="change">
  <constraintSpec ...>
    <constraint>
      <sch:rule xmlns:sch="http://purl.oclc.org/dsdl/schematron" context="..this is NOT a chapterabstract...">
        <sch:assert test="
          not(element())
        “>...</sch:assert>
      <sch:rule xmlns:sch="http://purl.oclc.org/dsdl/schematron" context="..this IS a chapterabstract...">
        <sch:assert test="
          ???  check "(text | markup | foreign | ref | bibref)*"  ???
        “>...</sch:assert>
      </sch:rule>
    </constraint>
  </constraintSpec>
  ...
</elementSpec>

(Sidenote: This example shows several solvable but headache-prone problems:

context expressions: how to distinguish if between chapterabstract or "normal" ab?
assert tests: how to write an XPath expr that checks for (text | markup | foreign | ref | bibref)* ? )

Even though possible, the ODD+Schematron approach has major drawbacks:

the odd content model is almost superfluous (since "overwritten" by schematron)
schematron rules can become very difficult to write (especially the constraints)
changing the data model might become a difficult task

My suggestion for improving ODD is to go more into the direction of context-free grammar like syntax. These are the advantages:

express context-free structures without the need for context-dependent rules (Schematron)
be more explicit about the hierarchical structure of valid documents
open the door for many nice features, like nice visualisation in the generated html docu

I want to stress that rules like in the Relax NG example above are very common, but very difficult to express with current ODD + context dependent rules (be it Schematron or some other context-sensitive restriction language). They shouldn't be.

hcayless commented 4 years ago

@EsGeh It's not that ODD isn't a context-free grammar (it is), it's that its rules for what can go on the left and right sides of productions are more like DTD rules than RNG rules. One of the explicit design requirements of ODD was that it should be able to generate schemas in multiple flavors, and therefore it does not follow the full expressive capabilities of RNG. The point of this ticket is to think about ways to get beyond that. There is an organizational difficulty in what you suggest, which is that the point of ODD (well, one of the points of it) is that we can put element documentation together with element definitions. If we break that by allowing one element to (re)define another, then we have to figure out what to do about that documentation.

That's not to say this can't be done, just that it's not quite as simple as "just do what RNG does".

raffazizzi commented 4 years ago

@jamescummings while you're not officially assigned, I think we're still hoping for a proposal from you. Let us know if that's still part of your plans

raffazizzi commented 3 years ago

Bump/nudge for @jamescummings

jamescummings commented 3 years ago

After quite some time I've been having a think about this and mulling over how to fully document in the ODD the intentions of variations in contextually-variant content models. I'm using content-model here to refer not only to child elements and classes (e.g. <content>) but also available attributes and values (e.g. attList). The thoughts I had were:

1) If there was a @context attribute as suggested above then what would its data type be? To me it seems this would be teidata.xpath to give some applicable context? If that was the case, then <model> and <equiv> often available alongside <content> already share a @predicate attribute -- is there any reason not to re-use that? Is it significantly different in intention from its use on those elements? The attribute's gloss is "the condition under which the element bearing this attribute applies, given as an XPath predicate expression".

2) If providing this attribute (@context or @predicate) for <content> then the same mechanism should be used with the provision of attributes. I think that the best location for this may be on <attList> (rather than say individual <attDef> and that if the same attributes are re-used in different context then their definition be repeated. The benefit of this is, of course, that other aspects of their provision further down the hierarchy can vary.

3) The <content> element is already repeatable and so this attribute would define the applicability of that content model to the current location in an instance document. This means that <bibl> in the context of ancestor::back could have a <content> that does not include, say, <citedRange> for whatever reason. With attributes, the @source attribute on <bibl> could be required in a context of ancestor::back but not exist for the <bibl> elements elsewhere.

4) If providing multiple definitions of content or attributes then it seems sensible that it should be a requirement that there is a <content> or whatever without this attribute to act as a fallback for any contexts not covered. Likewise, where contextually-variant content models aren't what is required (but contextually-variant warnings, etc.) then <constraintSpec> should be used.

5) Implementation and processing for this is non-trivial but akin to many of the tasks that already exist in ODD processing. Namely, in creating output schemas, for any element (or attribute or...) one needs to check whether there is a contextually-applicable version before falling back to the default. Ambiguity should be dealt with in the same way as it was with <model> to be consistent. How this is transformed into, say, RelaxNG is still open for discussion, but where a context is provided then elements/attributes/etc. should not exist or otherwise be modified in the specified contexts, but otherwise should be the default TEI.

6) This is not truly co-occurrence constraints on the instance document level which might have a restriction of content model based on the existence or not of other content/attributes elsewhere in the instance document. (So if there is an attribute with a particular value on this element in the header, then this element is not valid (or available?) in this other section of the document.) That seems to be much easier to continue to recommend <constraint>s as we already have them, e.g. schematron, etc.

ebeshero commented 3 years ago

Noting this ticket is relevant: https://github.com/TEIC/TEI/issues/2140

raffazizzi commented 2 years ago

After @jamescummings comments, this is squarely in "Needs Discussion" territory.

sydb commented 2 years ago

It was GO for a more fully-fleshed-out proposal for discussion, not for an actual implementation. 🙂

raffazizzi commented 1 year ago

F2F@Guelph thinks we should support this, using @predicate like it is already used by the processing model. We will wait for the first version of ATOP to be released before writing the XSLT implementation, however.

Make sure we have the attributes in place that we need to specify co-occurrence.

When this is done we will open tickets for:

revisit the entire TEI class system to use co-occurrence constraints and potentially move towards P6
write implementation in ATOP

HelenaSabel commented 1 year ago

When this work is being tackled, it is very likely that the issues of #1710 will come out. #1710 is blocked pending the resolution of this ticket.

sydb commented 1 year ago

@EsGeh — Wow, my apologies. It has been > 3 years since you posted, but I never saw it until now. Your premise (“ODD allows for exactly one single content model per element”) turns out not to be the case. Disregarding mode="change" for the moment, it is true that every <elementSpec> defines an element with 1 and only 1 name (“element type” in XML speak), and it is true that the <elementSpec> has 1 and only 1 identifier, expressed as an XML Name on the @ident attribute. But those two do not have to be the same. The name of the element may be specified separately in an <altIdent>. Thus the following snippet defines two versions of the <xmp:block> element, one that is used inside <div>, the other inside <profileDesc>.

        <elementSpec ident="chapterblock" mode="add" ns="http://www.example.edu/cocio">
          <altIdent>block</altIdent>
          <classes>
            <memberOf key="att.global"/>
            <memberOf key="model.divPart"/>
          </classes>
          <content>
            <alternate minOccurs="1" maxOccurs="unbounded">
              <textNode/>
              <elementRef key="code"/>
              <elementRef key="foreign"/>
              <elementRef key="bibl"/>
            </alternate>
          </content>
        </elementSpec>
        <elementSpec ident="otherblock" mode="add" ns="http://www.example.edu/cocio">
          <altIdent>block</altIdent>
          <classes>
            <memberOf key="att.global"/>
            <memberOf key="model.profileDescPart"/>
          </classes>
          <content>
            <textNode/>
          </content>
        </elementSpec>

In this case the elements are differentiated by their class membership. If another element (say <cell>) wanted to include only the "otherblock" version in its content model it would use <elementRef key="otherblock">. For a more complete example of this concept see my demo of a co-occurrence constraint via multiple declaration hack.

raffazizzi commented 1 year ago

@jamescummings does @sydb's example satisfy your original request?

raffazizzi commented 1 month ago

@jamescummings can you take a look at @sydb example above?

TEIC / TEI

Add contextually-variant content models to ODD #1744