TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
272 stars 88 forks source link

per-document defaults (of attribute values) #2090

Open sydb opened 3 years ago

sydb commented 3 years ago

This is a general proposal for a mechanism required to solve a particular problem. In our recent work on ruby annotations, at least one reviewer has requested a mechanism for establishing on a per-document basis where the ruby annotations are written with respect to the base text. (Typically something like “left” or “right” for vertically written texts, or “above” or “below” for horizontally written texts.) This could, of course, be expressed by specifying the @place attribute on every instance of the <rt> element. But since (in some cases) they would all be the same in a given document, using a default would make the encoding less bloated.

Some will argue that there already exist methods for declaring default attribute values using a schema language, including PureODD. But it is quite reasonable to believe that there are collections of documents which otherwise conform to the same set of constraints (in other words are of the same document type), for which having multiple customizations or schemas just for this one difference would seem silly.

And surely this is not the only case for which expressing default attribute values on a per-document (as opposed to per-document-type) basis is helpful. E.g., expressing that all <add> elements should be considered to have a @hand of "#LukeSkywalker" unless otherwise indicated; or that all <div1> are of @type="chapter" unless otherwise specified, etc.

So we (@martindholmes, @npcole, and I) are thinking that a generic “default attributes in this document” mechanism is appropriate. The TEI already had a mechanism for expressing the default attribute value of the @rend attribute. We think that something similar here would likely work well.

Here is a fanciful example of such encoding intended as a starting point for discussion rather than a hard-and-fast recommendation.

<tagsDecl>
  <defaultAttrs xml:id="LukesAdditions">
    <defaultAttr name="hand" value="#LukeSkywalker"/>
    <defaultAttr name="evidence" value="internal"/>
    <defaultAttr name="instant" value="false"/>
  </defaultAttrs>
  <namespace name="http://www.tei-c.org/ns/1.0">
    <tagUsage gi="add" default="#LukesAdditions"/>
  </namespace>
</tagsDecl>

If an attribute were to be in a non-TEI namespace (as opposed to no namespace or the null namespace, which is what namespace all XML attributes, including TEI attributes, are in by default), the corresponding <defaultAttr> elements could be wrapped by a <namespace> element. (@martindholmes even thinks the null-namespace ones should be wrapped in a <namespace name="">, such that expression of the namespace is consitently present, and thus clearer).

The example above uses the old indirect “tagUsage/@gi plus pointer” mechanism for associating default attribute values with an element type. We could decide instead, or in addition, to use a direct mechism that used either XPath or CSS selectors to choose to which elements default attributes would apply.

<tagsDecl>
  <defaultAttrs selector="body add">
    <defaultAttr name="hand" value="#LukeSkywalker"/>
    <defaultAttr name="evidence" value="internal"/>
    <defaultAttr name="instant" value="false"/>
  </defaultAttrs>
  <defaultAttrs selector="back add">
    <defaultAttr name="hand" value="#Yoda"/>
    <defaultAttr name="evidence" value="external"/>
    <defaultAttr name="instant" value="false"/>
  </defaultAttrs>
</tagsDecl>

or

<tagsDecl>
  <defaultAttrs select="body//add">
    <defaultAttr name="hand" value="#LukeSkywalker"/>
    <defaultAttr name="evidence" value="internal"/>
    <defaultAttr name="instant" value="false"/>
  </defaultAttrs>
  <defaultAttrs select="back//add">
    <defaultAttr name="hand" value="#Yoda"/>
    <defaultAttr name="evidence" value="external"/>
    <defaultAttr name="instant" value="false"/>
  </defaultAttrs>
</tagsDecl>

Note, of course, as per the warning on teidata.xpath, processing a user-supplied XPath can be dangerous. (On the other hand, processing a CSS selector strikes me as very difficult.)

We think it important that any such mechanism be designed so that XSLT could be used to read in a TEI document that makes use of this mechanism, and write out a version of that document that has had the <defaultAttrs> (or whatever) elements removed, and the defaulted attributes actually specified on each of the appropriate elements. We think any such processor will probably have to perform XInclude processing as well. (We also think it may be a good idea for TEI-C to publish such a stylesheet.)

Our initial thought is that this mechanism may be expressed in any TEI header, and that if there is a conflict, the <defaultAttrs> of the closest <teiHeader> wins. I.e., in the following structure

<TEI>
  <teiHeader>
    <!-- corpus-level default attrs here -->
  </teiHeader>
  <TEI>
    <teiHeader>
      <!-- default attrs for <standOff> here -->
    </teiHeader>
    <standOff><!-- ... --></standOff>
  </TEI>
  <TEI>
    <teiHeader>
      <!-- default attrs for transcriptions of ONE here -->
    </teiHeader>
    <sourceDoc><!-- ... --></sourceDoc>
    <text><!-- ... --></text>
  </TEI>
  <TEI>
    <teiHeader>
      <!-- default attrs for transcriptions of TWO here -->
    </teiHeader>
    <sourceDoc><!-- ... --></sourceDoc>
    <text>
      <!-- ... -->
      <fw/>
      <!-- ... -->
    </text>
  </TEI>
</TEI>

the attributes specified on the <fw> shown would take precedence over any defaults expressed in the document; the default attrs for transcriptions of TWO would be next, taking precedence over the default attrs for the corpus. The default attrs for standOff and transcriptions of ONE would not have any applicability.

We are not sure it is worth Council’s time to specify the implications of expressing a default attribute for one or more of the attributes involved in default attribute expression.

Note that I have previously argued that all the possible values of an attribute whose value is from a controlled vocabulary should, or at least could, be enumerated in the <teiHeader>. That is, I proposed a mechanism like that provided by teidata.enumerated but rather than have the enumerations listed in the ODD file, list them in the <teiHeader>. This mechanism could easily be expanded to express default values as well.

duncdrum commented 3 years ago

Yes please, teidata.xpath seems the better choice. As for validation how would that work with mandatory attributes? Sounds like a whole lot of rules would need to get rewritten, to check for presence of the attribute on the mandate elements, unless there is a default attribute definition in the header. While still an awesome idea, this could get messy.