TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

Need to clarify the relationship between `classSpec/@generate` and `classRef/@expand` #2369

Open martindholmes opened 1 year ago

martindholmes commented 1 year ago

The <classSpec> element has an attribute @generate which is defined as:

indicates which alternation and sequence instantiations of a model class may be referenced. By default, all variations are permitted.

The <classRef> element has an attribute @expand which is defined as:

indicates how references to this class within a content model should be interpreted.

Presumably there is some relationship between them, but the documentation is not clear on this. Is it an error to have a classRef/@expand which specifies "sequenceOptionalRepeatable" when the <classSpec> to which it occurs has only @generate="sequenceOptional"? It would help if the documentation could make this relationship clear, and define what should happen if this scenario is encountered by a processor. Should a Schematron rule be added to constrain classRef/@expand based on the value of classSpec/@generate?

It would also help to have some explanation of how/why one might use classSpec/@generate in the first place. There are no instances of it anywhere in the TEI specs, and none in any of the ODDs we have gathered for the ATOP project. If no-one has ever used it, can we just eliminate it and simplify ODD processing?

martindholmes commented 1 year ago

@lb42 What do you remember about this? What was the imagined use-case for @generate?

lb42 commented 1 year ago

The motivation for classSpec/@generate is partly, even mainly, the avoidance of magic. Given the need to have classSpec generate multiple patterns/entitiy declarations, it seemed necessary to document and hence potentially t leqst limit what those multiple things should be. You cant, of course, put a class as such into a content model, so we needed to make clear how a classRef should be mapped: hence the meed for @expand. What does it mean if @expand tries to produce something not allowed for by @generate? it might trigger a warning i suppose, but it would presumably produce a partial or invalid content model. Has anyone ever used @generate with a non-default value? I doubt it very much! But it's a good thing to keep developers honest by requiring that arbitrary decisions like how a classReference in a content model can potentially be interpreted be explicitly documented I think.

sydb commented 1 year ago

it's a good thing to keep developers honest by requiring that arbitrary decisions like how a classReference in a content model can potentially be interpreted be explicitly documented

Unless, of course, you want to say that once a class is defined, it can be referenced however the referrer wants. As it is, the definer of the class gets to limit how it might be referred to. But as we all seem to agree, that is a very rarely used feature.

sydb commented 1 year ago

Cf. Stylesheets #582.

I am almost convinced that we should just drop classSpec/@generate. (Well, deprecate it by saying “this attribute is ignored; in all cases all 5 are generated”.

dmj commented 1 year ago

For completeness: If @‍generate is kept we need to discuss its relation to the model class hierarchy.

HelenaSabel commented 1 year ago

@joeytakeda found this example of @generate: https://listserv.brown.edu/cgi-bin/wa?A2=TEI-L;5d283e3b.0609

hcayless commented 1 year ago

After looking at this, I find I don't understand classRef/@expand either: it can specify a sequence, but where does the order of the sequence come from? Is it simply the order in which class members were encountered during processing? That seems random, if so. Anyway, I'm very confused...

lb42 commented 1 year ago

I asked sebastian the same question, many years ago.

hcayless commented 1 year ago

I'm quite worried by this. The whole thing seems to assume that classes behave like partial content models, but a class has no idea what order its members are supposed to be in, so how can it be useful to have an @expand value other than 'alternation', the default? Do we say anywhere that the order in which elements are defined dictates how their class membership is expressed? Can we override that in chained ODDs? Isn't this what macros are for? I'm now kind of in favor of killing both classSpec/@generate and classRef/@expand.

joeytakeda commented 1 year ago

I'm definitely in favour of getting rid of classSpec/@generate at the very least. However, reading the description of classSpec over again, I think "sequence" isn't necessarily an ordered sequence (if "sequence" is similar to <tei:sequence>, which can be ordered or unordered via @preserveOrder), so I don't know if order would ever be considered in the case of @generate and @expand?

hcayless commented 1 year ago

I don't think those two are the same 'sequence'. But it does point up what I think is an error in the documentation. Unless I'm very much mistaken, tei:sequences are (as you would hope and expect) ordered by default. If you set @preserveOrder to false, then in RelaxNG, an <interleave> will be generated, otherwise, it's ordered. The Idea of an unordered sequence hurts me a little, but anyway, the definition given in https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-sequence.html is backwards. It should say what happens when @preserveOrder is false. @preserveOrder="true" is the default (even though it's not a default value, and I don't think we should make it one).

hcayless commented 1 year ago

Given that there's not a good way to establish a class membership order in the first place, I don't think there's any sense in having mechanisms to exploit membership order. My vote is to deprecate both classSpec/@generate and classRef/@expand.

dmj commented 1 year ago

I don't think those two are the same 'sequence'. But it does point up what I think is an error in the documentation. Unless I'm very much mistaken, tei:sequences are (as you would hope and expect) ordered by default. If you set @preserveOrder to false, then in RelaxNG, an <interleave> will be generated, otherwise, it's ordered.

There seems to be a regression wrt https://github.com/TEIC/Stylesheets/issues/241. See https://gist.github.com/dmj/cc0028192edc044d69d5a6b7269657eb -- The @‍preserveOrder does not have a discernable effect with in Stylesheets version 4.4.0.

joeytakeda commented 1 year ago

@hcayless :

Given that there's not a good way to establish a class membership order in the first place, I don't think there's any sense in having mechanisms to exploit membership order. My vote is to deprecate both classSpec/@generate and classRef/@expand.

That makes a lot sense—I'm convinced, and thus +1 for deprecating moduleRef/@generate and classRef/@expand (and allow me to add another point against classRef/@expand: how, if at all, it would ever make sense to have an @expand on an classRef that points to an attribute class [which is technically valid and passes through the Stylesheets without complaint, even though I doubt it does anything?])

FWIW: As far as I can tell (per a //classRef[@expand] in the P5/Source/Specs), there are only 4 instances of classRef/@expand (pointing to three classes) in the Guidelines

File @model @expand Model referred to elsewhere?
altIdentifier.xml model.placeNamePart sequenceOptional location.xml
objectIdentifier.xml
* unitDef.xml (with @minOccurs="0")
msIdentifier.xml model.placeNamePart sequenceOptional [Same as altIdentifier.xml above]
physDesc.xml model.physDescPart sequenceOptional [None]
textDesc.xml model.textDescPart sequence [None]

(Where the first column refers to the specification file in which the classRef is contained and last column are other specifications that have a <classRef> that points to the same model, but does not have a @expand)


@dmj :

There seems to be a regression wrt TEIC/Stylesheets#241. See https://gist.github.com/dmj/cc0028192edc044d69d5a6b7269657eb -- The @‍preserveOrder does not have a discernable effect with in Stylesheets version 4.4.0.

Testing using the latest Stylesheets in dev, I can't replicate that issue from your test ODD in the gist. Here's what I get for the RNC:

namespace sch = "http://purl.oclc.org/dsdl/schematron"
default namespace tei = "http://www.tei-c.org/ns/1.0"
namespace teix = "http://www.tei-c.org/ns/Examples"
namespace xlink = "http://www.w3.org/1999/xlink"

# Schema generated from ODD source 2023-01-30T08:09:47Z. .
# TEI Edition: Version 4.6.0a. Last updated on
#         5th January 2023, revision 9074b9038
# TEI Edition Location: https://www.tei-c.org/Vault/P5/Version 4.6.0a./
#

#

sch:ns [ prefix = "tei" uri = "http://www.tei-c.org/ns/1.0" ]
outermost-element =

  ##
  element outermost-element {
    sequence, sequence-preserveOrder-true, sequence-preserveOrder-false
  }
sequence =

  ##
  element sequence { element-1, element-2, element-3 }
sequence-preserveOrder-true =

  ##
  element sequence-preserveOrder-true {
    element-1, element-2, element-3
  }
sequence-preserveOrder-false =

  ##
  element sequence-preserveOrder-false {
    element-1 & element-2 & element-3
  }
element-1 =

  ##
  element element-1 { empty }
element-2 =

  ##
  element element-2 { empty }
element-3 =

  ##
  element element-3 { empty }
start = outermost-element
dmj commented 1 year ago

Testing using the latest Stylesheets in dev, I can't replicate that issue from your test ODD in the gist. Here's what I get for the RNC: ...

Right! I saw your fix was from July 2022, after the release of 4.4.0 (which ships with oXygen 25).

sydb commented 1 year ago

Wow, very useful analysis, @joeytakeda, thank you.

I have checked all 4 of those class references that use @expand, and in all 4 cases the expansion is in the order the elements are defined in the Guidelines.

So while I disagree quite strongly with the notion that there is no order, or there is no way to specify the order, I think most of us are at least unhappy with, if not repulsed by, the idea that the order of <elementSpec>s matters. Element specifications are declarative, and thus their order (in a given file) should have no importance. (Like the templates in an XSLT program.)

My current positions (which are not carved in stone) follow.

@generate of <classSpec>

Death by (rapid) deprecation. It should simply be dropped, as it serves no (useful) purpose.

@expand of <classRef>

The question on what to do with this attribute needs to be taken one step at a time. (1) What to do about the various "sequence" values, since the current system (generate a sequence in the document order of the <elementSpec>s that are being referred to) is so undesirable. (2) If the solution to the previous question is to get rid of the various "sequence" values, then do we need to keep the attribute at all?

I am going to address question (2) first. Without the various "sequence" values, the only value remaining is "alternation", which is the default, so the “just drop the attribute” idea is very reasonable. But I’d like to propose an alternative. What if we allowed interleave? Yes, I know, use of interleave breaks DTDs (and maybe XSDs, does anyone know?). But as long as we do not use interleave in the Guidelines themselves is there any reason not to allow users (who likely do not care about DTDs) to use interleave? I submit that in most cases where a modern user wants to require that each element in a set be present she does not care about the order, at least not much.

As for question (1), I see several possible solutions.

  1. Leave it as is. Much as it rubs us the wrong way, no one has really complained about it for well over a decade.
  2. References to elements occur in some other defined order, e.g. alphabetical.
  3. Give the user some mechanism for specifying an order (other than re-arranging <elementSpec>s). This is a very enticing idea, but it is likely to be really hard. E.g., have to consider what happens when an element is added to a class via customization; or perhaps harder: what happens when an element that is in a class is replaced via customization? Also, how does a user customize the order?
  4. Drop the various "sequence" values, they are not worth the trouble. In those few cases where a user wants to enforce order of a set of elements that is already in a class, the user has to either refer to the elements directly (<elementRef>) or create a macro for the purpose. Either way, the information is duplicated, which is bad, but the problem of “what order” is dodged.
ebeshero commented 9 months ago

Council F2F Subgroup: We recommend testing to see whether we lose anything important with the removal of classSpec/@generate and classRef/@expand.

sydb commented 9 months ago

My somewhat more precise recollection of this morning’s subgroup decision with added details follows.

  1. Deprecate classSpec/@generate; during deprecation change current Stylesheets so that this attr is ignored (and all patterns are generated in all cases); ATOP processing will just ignore this attribute
  2. Investigate other mechanisms for expansion of class memebership, e.g. using <classRef> as child of <sequence> or <alternate>.
lb42 commented 9 months ago

Presumably in the absence of the "sequence" options the recommended way of achieving "a sequence of model.foo elements" would be to define a macro. That means specifying the elements concerned explicitly of course, but if you dont want to use the order of their declaration theres not many options . (There was i vaguely recall a suggesti9n that seqience shouldbe interpreted as "alphabetical order". Nobody liked that much.)

martindholmes commented 3 months ago

ATOP group notes for its own future information: If <classRef> is allowed in <content>, then handling for that will have to be added to the transpile stage of the ATOP process; it can't be done earlier.

raffazizzi commented 2 months ago

VF2F decides to deprecate @generate and @expand. The rationale for deprecating @expand is that it shouldn't be possible to impose an order to an unordered structure (class members are not ordered). expand="alternation" will be supported by allowing classRef inside alternate. For imposing order with sequence, one will need to use elementRefs as shown in this example in the Guidelines.