TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

msIdentifier should be changed to allow only an idno or msName #2345

Open larkvi opened 1 year ago

larkvi commented 1 year ago

Per the msDesc SIG meeting at TEI2022, there was general agreement that the most minimal identification of a manuscript might be only an idno or msName (e.g. This may be the case with virtually reconstructed manuscripts, for example, where there is no singular manuscript being identified, but only a TEI file pointing to several other manuscripts. This may also be the case when we are referring to a lost manuscript or a manuscript which only exists as a siglum within a reconstucted editorial hierarchy.) Per #2258 it may be necessary to also include a respository or a location for whatever reason that was considered to be minimally identifying in the past (I would personally argue that without an idno or name it is not minimally identifying in any reasonable way).

sydb commented 1 year ago

The current content model of <msIdentifier> is

(
  placeName?,
  bloc?,
  country?,
  region?,
  settlement?,
  district?,
  geogName?,
  institution?,
  repository?,
  collection*,
  idno*
),
( msName | objectName | altIdentifier )*

The current additional constraint (re-written a bit for readability) is that the following XPath is considered an error.

  not( parent::msPart )
  and
  ( child::*[1][self::idno|self::altIdentifier]  or  normalize-space(.) eq '')

(Although, as I pointed out in #2258, the message issued does not match the XPath. The message implies only that one of the model.placeNamePart elements (i.e., those that occur before <collection> in the model) must be present.) That is, the closed schema does not require that there be any children of <msIdentifier> at all; and the open schema requires only that an <msIdentifier> that is a child of <bibl>, <msDesc>, or <msFrag> have some textual content and have either a) one of the elements that are mentioned before <idno> in the content model, or b) an <msName> or <objectName> (and, in case (b), that an <altIdentifier> not occur first, before the <msName> or <objectName>). I am going to summarily ignore the “there must be textual content” restriction in the examples below, because (as I implied in #2258) I think it is silly. (A “there must be textual content OR a @ref or @key attribute” might make a lot of sense, though.) So, according to the current content model with the above mentioned simplification and without the content restriction, the following two <msIdentifier>s are invalid, unless a child of <msPart>.

<msIdentifier n="INVALID01">
  <altIdentifier><!-- ... --></altIdentifier>
  <msName/>
</msIdentifier>
<msIdentifier n="INVALID02">
  <idno/>
  <objectName/>
</msIdentifier>

I suspect that the original task force designing this content model thought that a “location” of some sort should be present, otherwise how would one find the physical mss? That said, the argument that the mss may be a virtual reconstruction that does not have a single location seems sound at first blush. The following <msIdentifier>s are valid. I do not see how VALID01 could identify a manuscript, although it could identify an object (a sloop, to be precise), I suppose; VALID02 and VALID03 seem outright nuts to me.

<msIdentifier n="VALID01">
  <geogName>Hudson River</geogName>
  <objectName>Clearwater</objectName>
</msIdentifier>
<msIdentifier n="VALID02">
  <geogName>Hudson River</geogName>
</msIdentifier>
<msIdentifier n="VALID03">
  <bloc>NATO</bloc>
</msIdentifier>

My head is spinning.

jamescummings commented 1 year ago

I suspect that the original task force designing this content model thought that a “location” of some sort should be present, otherwise how would one find the physical mss? That said, the argument that the mss may be a virtual reconstruction that does not have a single location seems sound at first blush.

Yes. Currently one can have something without a location. It just needs an msName (so if that is really the request, then that is already possible). I suspect it can't only be an idno because you'd need to say what institution or collection this idno refers to. A msName might incorporate enough information about the repository.

This is currently valid:

   <msIdentifier n="VALID">
               <msName>Tynemouth Castle MS Number 1</msName>
   </msIdentifier>

But this is not:

   <msIdentifier n="INVALID">
               <idno>Tynemouth Castle MS Number 1</idno>
   </msIdentifier>

But the Schematron warning is indeed outdated and incorrectly phrased.