TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
270 stars 88 forks source link

more on `<altIdent>` #2285

Closed sydb closed 7 months ago

sydb commented 2 years ago

introduction

If you thought #2049 was more than enough fixes for <altIdent>, think again. Now that we are on the road to improving its content and where it can go in a TEI document, it is time to start thinking through its semantics and required constraints more carefully. (This, BTW, comes up acutely in the context of the atop project.)

After #2049 has been completed, <altIdent> will only be allowed as a child of <attDef>, <classSpec>, <constraintSpec>, <dataSpec>, <elementSpec>, <macroSpec>, or <valItem>. In all cases it may appear 0 or more times, and may or may not have an @xml:lang.

We (or at least, I) do not know why <altIdent> is repeatable. As far as I can tell, there is no occurrence of a 2nd <altIdent> child of any element anywhere in the Guidelines (including examples) or in the Exemplars/ directory. (I searched for //*[ tei:altIdent[2] | teix:altIdent[2] ] in all files within the P5/ hierarchy with file extensions ".xml", ".odd", or ".tei".)

So it seems to me we need a discussion about what a 2nd (or 3rd or 4th) <altIdent> might mean. Heck, I don’t even know what the 1st one means for anything other than an <elementSpec>, an <attDef>, or a <valItem>!

singleton

In those three constructs (<elementSpec>, <attDef>, <valItem>) it means “this is the name or value of this thing that you (the processor) should put in the schema; you (the human) should, as usual, use the @ident of the parent element refer to it from within the ODD(-chain), but do not expect it in the output schema(s).” I have tested, and at least the “use this name in the schema” part works for all 3 of them.

But what does it mean for <dataSpec>? The name of a datatype never makes it into the RELAX NG schema, so the fact that you can call it something different in the ODD file is irrelevant, isn’t it? I have not tested, but seems to me the same is true for <constraintSpec>.

As for <classSpec> and <macroSpec>, their identifiers do make it into the RELAX NG, but prefixed by the value of the @prefix attribute of the <schemaSpec> (or <moduleRef>?) in force. So I guess having a different name to use in the ODD could be useful, but I have never tried it (at least, not that I remember), and do not know that it works with the current Stylesheets. Addendum: I just tested with <macroSpec>. As far as I could tell, it (a child <altIdent>) had no effect — you cannot refer to the macro by the <altIdent> value. (Well, you can refer to it, but the result is as if you referred to a macro that does not exist: the reference is silently ignored.)

multiple

Both the vanilla Guidelines (i.e., tei_all) and p5odds permit multiple <altIdent> siblings. The tei_customization schema does not, because it knows that 2 or more cause an error from the Stylesheets. (Namely “A sequence of more than one item is not allowed as the first argument of fn:normalize-space()”.)

At the moment I cannot think of any reason we might even want multiple <altIdent> siblings unless they were being used to allow different element & attribute names and attribute values for the “same” schemas in different languages. That is, if each set of <altIdent> siblings were differentiated by @xml:lang, and the processor picked the “right” one.

possible solutions

Possible solutions include, but are probably not limited to, the following.

  1. Change TEI so that only 1 <altIdent> is permitted, and remove it from <dataSpec> & <constraintSpec>, and perhaps from <classSpec> & <macroSpec>
  2. Leave TEI alone, but have ODD processors (i.e., the Stylesheets and ATOP) issue warnings whenever a 2nd sibling <altIdent> is found or even a 1st <altIdent> in <dataSpec>, <constraintSpec>, <classSpec>, or <macroSpec>.
  3. Improve the discussion in the Guidelines to explain the “multiple <altIdent>s for multiple languages” idea, and then make sure ODD processors can actually do that.

The reason I am not particularly fond of solution 3 is because it seems like a lot of work to support a feature that almost no one except @alex-bia has requested, and even his real use case can be handled by the current system. (He only really cares about having “Spanish-like” attribute values and names for elements & attributes, so just <altIdent xml:lang="es">párrafo</altIdent> would do the job, he does not need a 2nd <altIdent>.)

Solution 2 has the advantage that it allows someone else to come along and write an ODD processor that does the multi-language bit from solution 3, but does not mean we have to do that work.

Solution 1 has the advantage of being sensible, simple, clear, clean, and relatively easy to implement. It also means any ODD writer, not just one who is using tei_customize, gets a validation error immediately.

dmj commented 2 years ago

Thanks for summing this up! In the prose of the guidlines I also found this remark:

By default, the altIdent of a component is identical to the value of its ident attribute.

From my understanding this means that we always have at least one implicit altIdent. This clears up https://github.com/TEIC/Stylesheets/issues/237 I think.

I would opt for solution 2: If an *Spec contains multiple altIdent and the ODD to RNG processor cannot make a decision which one to use, it should fail.

martindholmes commented 1 year ago

Council F2F 2022-09-13 had a long discussion:

For the ODD stage where RNG is being generated, there should only be a single altIdent at most. However, earlier in the processing chain, it is reasonable to have different altIdents for different languages and for other purpose such as cultural sensitivity; it makes sense, for instance, to have Arabic altIdents for convenience when working RTL, with another LTR language altIdent alongside it. So we think that the requirement should be that if you have multiple altIdents, they must all be distinguished in some way (such as @targetLang) so that the processor can select the appropriate altIdent before generating the PLODD (to use ATOP terminology) and then have no further decision to make when generating the RNG. It's not yet clear what the appropriate attributes should be and how the selection should be specified as an input to the ODD processing.

Council tasks the ATOP group with coming up with a detailed proposal clarifying what should be allowed and how and when it should be processed, and to run this by the Internationalization group before coming back to Council for final approval.

sydb commented 1 year ago

GREEN for @sydb (perhaps with @ebeshero’s help) to come up with a suggested list of elements from which <altIdent> can be removed from the content model. (I.e., those *Spec elements that do not need it for co-occurrence constraint reasons or for XML-construct renaming reasons.)

sydb commented 1 year ago

The list of places <altIdent> is allowed, with annotations

At the moment, I cannot come up with a reason why one would ever need an <altIdent> as a child of <classSpec>, <constraintSpec>, <dataSpec>, or <macroSpec>.

sydb commented 1 year ago

Looking in every file in my main directory (which contains TEI repos, WWP repos, consulting work, etc., including the ODDs people sent in to ATOP), ignoring most backups, I find the parents of <altIdent> are:

    221 elementSpec
    146 attDef
     63 moduleSpec
     44 valItem
     31 egXML

All cases of moduleSpec/altIdent are either old <altIdent type="FPI">, for which we now use <ident type="FPI">, or are glosses of the module name which should probably have been encoded in <gloss>, instead. (They are for the CMC schema and the CBML schema.)

So given these findings and the thought experiment above that results in only <elementSpec>, <attDef>, and <valItem> needing a child <altIdent>, I think we should go ahead and depricate, then remove, <altIdent> as a child of <classSpec>, <constraintSpec>, <dataSpec>, and <macroSpec>.

sydb commented 1 year ago

F2F Council decides that P5 should continue to allow <altIdent> in all places it is currently allowed (<attDef>, <classSpec>, <constraintSpec>, <dataSpec>, <elementSpec>, <macroSpec>, and <valItem>), but that p5odds and tei_customization should limit it to <attDef>, <elementSpec>, and <valItem>.

SB is GREEN to impliment this and close ticket w/ PR