Open jamescummings opened 6 years ago
Yes, solving the Durand conundrum part 2: replacing schematron with PureODD!
Co-occurrence constraints are a specific form of additional constraint on content, which you might well implement by means of schematron. A typical co-occurrence constraint might say that if an attribute has some value, then some other attribute should not be permitted, for example. Or it might say that if this element is present in a content model, then that other element should not be. The 'context' attribute proposed (but not implemented) in pure ODD originally is rather different: it was intended to help with the more frequent problem of wanting elements to have different content models in different contexts: for example, I might want to require the <p> elements in my <text> to be segmented into <s>s, but not those in my <teiHeader. This too can be implemented in schematron. And there are lots of other things that schematron can do which are neither contextual nor co-occurrence constraints. So what exactly is the proposal here? ODD already permits the inclusion of embedded schematron rules of any kind. Is the proposal to re-express some or all of the semantics of schematron in ODD? That would certainly be tidier, though presumably it would still have to be translated into something else (presumably schematron) to be of any practical use.
A few thoughts, which I can't really afford to go into in detail right now, but should jot down lest I forget them:
<duck>
is not allowed in teiHeader//p
after the fact, but to not have duck
in the list of pop-ups when you insert an element inside teiHeader//p
in the first place.@context
attribute (as I imagine it) would be somewhat more limited than full-fledged co-occurrence constraints (themselves more limited than full Schematron), but could (as @lb42 points out) be designed to handle the most common TEI cases. That may well be the way to go.Even implementing the @context
attribute is going to be a lot of work. And worth noting that, because ODD still supports use of RELAX NG content models in macros, co-occurrence constraints can be hacked. (Bad idea, some would say.)
OK, but don't call it "co-occurrence constraint" then. The requirement is for contextually-variant content models. Which is what the @context
attribute was invented for. See Rahtz & Burnard 2013.
Here's another use case (a real one). ELTeC wants to use a constrained <bibl> in the sourceDoc of its headers -- requiring certain members of model.biblLike to be present -- but it doesn't want <bibl> anywhere else to be constrained.
(In reply to @lb42 of 14:54Z, i.e. ~15 mins ago.)
Point taken. But to be fair, contextually-variant other things would be useful, too. Do you mean your XML London talk?
I really think we're seeing the beginning of P6 here. The combination of a hierarchical class system and an extended Pure ODD that can encompass context-dependent context models is a big enough change in my view to make it a start-from-scratch project, and it would be much more straightforward to start with a relatively clean slate than try to graft all these things onto the existing system while retaining backward compatibility.
I think I might have a higher bar to what constitutes moving to P6 than @martindholmes.
@lb42: You are right that I was conflating co-occurrence constraints and contextually-variant content models. However, my thinking was that if we are solving this for content models then why would we not be using the same mechanism for attribute valLists etc. I agree with all your use cases that this is a good thing to be done and I think is part of our updating of ODD to be pure and complete.
I don't want to replace schematron (though I recognise one can do many of these things by schematron rules). It isn't about giving users warnings or errors that the thing they are doing is crossing some line in the sand. It is about them having or not having the choices available to them or restrictions already placed on them. That could be on an ODD-level of
An example might be that underneath <text>
your <bibl>
elements must have a @source
attribute if and only if you have a //back/div/@type='bibliography'
. Yes that can easily be done in schematron but that doesn't really document what your ODD is doing (just adds it as a constraint). Really what ODD should be doing is documenting this not through schematron but through ODD itself. We would want to record that element <bibl>
is different depending on some co-occurrence that happens to be based on attribute values rather than elements. (i.e. rather than it happening to appear in the header). To me this is the same kind of constraint and should be modeled using the same forms of mechanism in ODD. It is about the documentation of our intent of the constraint in the meta-schema of ODD rather than the implementation of it. I'm more than happy for processing to turn this into schematron as a much more straightforward way to say it but the ODD language should be able to document constraints that change the content model of elements (which for me includes whether an attribute is required or not) without needing to resort to schematron (even if that is what is generated from it).
F2F subgroup is in favor of creating a system for co-occurence constraints in ODD. Would @jamescummings produce a more fully fledged proposal?
Greenlighted for @jamescummings to develop a proposal, with further discussion.
There seems some overlap with problems we ran into when trying to compile Relax NG to (equivalent) ODD. Therefore I'd like to share my thoughts: Relax NG has a similar structure to a context-free grammar. Here different content can be specified for a single (XML-) element for every occurrence on the right hand side of rules.
ab =
element ab {
text
}
chapterabstract =
element ab {
(text | markup | foreign | ref | bibref)*
}
...
divX = element div { ab }
divY = element div { chapterabstract }
...
(here different content models are specified for ab
in different rules)
Current ODD allows for exactly one single content model per element. In order to enforce semantically equivalent restrictions in ODD one would have to check the content manually using multiple schematron rules which, depending on the "context" (e.g. the parent xml element), check the content using XPath expressions.
<elementSpec ident="ab" mode="change">
<constraintSpec ...>
<constraint>
<sch:rule xmlns:sch="http://purl.oclc.org/dsdl/schematron" context="..this is NOT a chapterabstract...">
<sch:assert test="
not(element())
“>...</sch:assert>
<sch:rule xmlns:sch="http://purl.oclc.org/dsdl/schematron" context="..this IS a chapterabstract...">
<sch:assert test="
??? check "(text | markup | foreign | ref | bibref)*" ???
“>...</sch:assert>
</sch:rule>
</constraint>
</constraintSpec>
...
</elementSpec>
(Sidenote: This example shows several solvable but headache-prone problems:
chapterabstract
or "normal" ab
?(text | markup | foreign | ref | bibref)*
?
)Even though possible, the ODD+Schematron approach has major drawbacks:
My suggestion for improving ODD is to go more into the direction of context-free grammar like syntax. These are the advantages:
I want to stress that rules like in the Relax NG example above are very common, but very difficult to express with current ODD + context dependent rules (be it Schematron or some other context-sensitive restriction language). They shouldn't be.
@EsGeh It's not that ODD isn't a context-free grammar (it is), it's that its rules for what can go on the left and right sides of productions are more like DTD rules than RNG rules. One of the explicit design requirements of ODD was that it should be able to generate schemas in multiple flavors, and therefore it does not follow the full expressive capabilities of RNG. The point of this ticket is to think about ways to get beyond that. There is an organizational difficulty in what you suggest, which is that the point of ODD (well, one of the points of it) is that we can put element documentation together with element definitions. If we break that by allowing one element to (re)define another, then we have to figure out what to do about that documentation.
That's not to say this can't be done, just that it's not quite as simple as "just do what RNG does".
@jamescummings while you're not officially assigned, I think we're still hoping for a proposal from you. Let us know if that's still part of your plans
Bump/nudge for @jamescummings
After quite some time I've been having a think about this and mulling over how to fully document in the ODD the intentions of variations in contextually-variant content models. I'm using content-model here to refer not only to child elements and classes (e.g. <content>
) but also available attributes and values (e.g. attList
). The thoughts I had were:
1) If there was a @context
attribute as suggested above then what would its data type be? To me it seems this would be teidata.xpath to give some applicable context? If that was the case, then <model>
and <equiv>
often available alongside <content>
already share a @predicate
attribute -- is there any reason not to re-use that? Is it significantly different in intention from its use on those elements? The attribute's gloss is "the condition under which the element bearing this attribute applies, given as an XPath predicate expression".
2) If providing this attribute (@context
or @predicate
) for <content>
then the same mechanism should be used with the provision of attributes. I think that the best location for this may be on <attList>
(rather than say individual <attDef>
and that if the same attributes are re-used in different context then their definition be repeated. The benefit of this is, of course, that other aspects of their provision further down the hierarchy can vary.
3) The <content>
element is already repeatable and so this attribute would define the applicability of that content model to the current location in an instance document. This means that <bibl>
in the context of ancestor::back
could have a <content>
that does not include, say, <citedRange>
for whatever reason. With attributes, the @source
attribute on <bibl>
could be required in a context of ancestor::back
but not exist for the <bibl>
elements elsewhere.
4) If providing multiple definitions of content or attributes then it seems sensible that it should be a requirement that there is a <content>
or whatever without this attribute to act as a fallback for any contexts not covered. Likewise, where contextually-variant content models aren't what is required (but contextually-variant warnings, etc.) then <constraintSpec>
should be used.
5) Implementation and processing for this is non-trivial but akin to many of the tasks that already exist in ODD processing. Namely, in creating output schemas, for any element (or attribute or...) one needs to check whether there is a contextually-applicable version before falling back to the default. Ambiguity should be dealt with in the same way as it was with <model>
to be consistent. How this is transformed into, say, RelaxNG is still open for discussion, but where a context is provided then elements/attributes/etc. should not exist or otherwise be modified in the specified contexts, but otherwise should be the default TEI.
6) This is not truly co-occurrence constraints on the instance document level which might have a restriction of content model based on the existence or not of other content/attributes elsewhere in the instance document. (So if there is an attribute with a particular value on this element in the header, then this element is not valid (or available?) in this other section of the document.) That seems to be much easier to continue to recommend <constraint>
s as we already have them, e.g. schematron, etc.
Noting this ticket is relevant: https://github.com/TEIC/TEI/issues/2140
After @jamescummings comments, this is squarely in "Needs Discussion" territory.
It was GO for a more fully-fleshed-out proposal for discussion, not for an actual implementation. 🙂
F2F@Guelph thinks we should support this, using @predicate
like it is already used by the processing model. We will wait for the first version of ATOP to be released before writing the XSLT implementation, however.
Make sure we have the attributes in place that we need to specify co-occurrence.
When this is done we will open tickets for:
When this work is being tackled, it is very likely that the issues of #1710 will come out. #1710 is blocked pending the resolution of this ticket.
@EsGeh — Wow, my apologies. It has been > 3 years since you posted, but I never saw it until now. Your premise (“ODD allows for exactly one single content model per element”) turns out not to be the case. Disregarding mode="change" for the moment, it is true that every <elementSpec>
defines an element with 1 and only 1 name (“element type” in XML speak), and it is true that the <elementSpec>
has 1 and only 1 identifier, expressed as an XML Name on the @ident
attribute. But those two do not have to be the same. The name of the element may be specified separately in an <altIdent>
. Thus the following snippet defines two versions of the <xmp:block>
element, one that is used inside <div>
, the other inside <profileDesc>
.
<elementSpec ident="chapterblock" mode="add" ns="http://www.example.edu/cocio">
<altIdent>block</altIdent>
<classes>
<memberOf key="att.global"/>
<memberOf key="model.divPart"/>
</classes>
<content>
<alternate minOccurs="1" maxOccurs="unbounded">
<textNode/>
<elementRef key="code"/>
<elementRef key="foreign"/>
<elementRef key="bibl"/>
</alternate>
</content>
</elementSpec>
<elementSpec ident="otherblock" mode="add" ns="http://www.example.edu/cocio">
<altIdent>block</altIdent>
<classes>
<memberOf key="att.global"/>
<memberOf key="model.profileDescPart"/>
</classes>
<content>
<textNode/>
</content>
</elementSpec>
In this case the elements are differentiated by their class membership. If another element (say <cell>
) wanted to include only the "otherblock" version in its content model it would use <elementRef key="otherblock">
.
For a more complete example of this concept see my demo of a co-occurrence constraint via multiple declaration hack.
@jamescummings does @sydb's example satisfy your original request?
@jamescummings can you take a look at @sydb example above?
We should add co-occurrence constraints [edit: Contextually-variant content models, really] properly to the ODD language.
This should happen for content models, attribute value lists, classSpecs, dataTypes etc.
Perhaps with a 'context' attribute or some such thing. This is more complicated than it might sound in doing it properly in ODD, but even more so in thinking about how our processing will deal with it.
(Based on discussion with @sydb in a breakout group at Council Face to Face Cologne, 2018-02-24)