TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
269 stars 88 forks source link

Add a `charters` module #2376

Open larkvi opened 1 year ago

larkvi commented 1 year ago

Charters Encoding

Charters and other historical legal documents have many features that are already provided in msdescription. They have a number of letter-like properties that are not modelled well by the existing elements for correspondence. They have features that push the limits of how the TEI can model elements that prove authenticity. Finally, there is a long tradition of specialized academic study, the historical sub-field of diplomatic(s), with a strong concern about the transmission of these documents as well as its own historical practices and preferences for how charters are described. For these reasons, charters need descriptive elements over and above those presently existing in the TEI P5.

To date, the largest respository of charters in XML format is the Monasterium.net archive. Monasterium.net uses the Charters Encoding Initiative (CEI) GVogeler/cei, a fork of TEI P4 developed by Georg Vogeler (@GVogeler) et al., with both specialized charters elements as well as all the duplicate elements of the P4 version of the TEI. Under Georg's direction, I undertook a modernization of the CEI to be compatible with the TEI P5 (GVogeler/CEI2TEI), producing an ODD that reduces the large number of novel CEI elements down to a manageable number. CEI2TEI is now four years old, with no significant competitors for modelling charters. Monasterium.net is due to be migrated to the TEI P5 as part of Georg's ERC Project "From Distant to Digital Diplomatics" (@didip-eu), and in advance of that migration, we would like to propose the integration of charters-specific encoding into the TEI P5, in order to have the largest source of TEI-encoded charters also be a compliant example of encoding.

The Elements

I have extracted the new(/changed) elements here with their description from the ODD (in italics) and my comments on them. These elements were created based upon my analysis of the actual use of the CEI in the Monasterium database, focusing on

  1. what is not already covered by the TEI
  2. what is uniquely required for charters encoding
  3. what was actually used by charters encoders

Accordingly, the following ten elements boil down a much larger list of CEI elements by associating conceptual categories with elements and their specific instances with @types. This is most notable with the diploPart element, which represents 22 separate types of clauses that were formerly separate elements. By making them typed lists in this way, I believe that the module will be simpler to use, and diploPart will be searchable even if different encoders use different naming conventions for the clauses. Similarly, legalActor represents 11 different types of actors and may be extended to suit other legal contexts (with the addition, perhaps of more modern actors like bailiffs, servers, etc.) Many of these types are covered by the Vocabulaire Internationale de la Diplomatique (1992), which is available in a TEI taxonomy as well.

diploDesc

diploDesc is simply a desc-type element for diplomatist-specific concerns: Contains a diplomatic description and analysis of a document, including bibliographic references to studies; formal criticism of content and textual/legal form (as opposed to the physical form, in physDesc); and discussions of transmission and authenticiy. I am not aware of a more appropriate place to put this information in the way that it appears, and the corresponding CEI element is one of the most-used, making this, or an equivalent that the Council deems fitting, one of the most important for data migration.

authen

I have written in the jTEI about the desirability of a seal-equivalent element with a larger semantic coverage, representing the wide cross-cultural range of features and practices with equivalent or near-equivalent significance. With regard to the ongoing discussion within the manuscript description SIG about how ontological divisions of manuscripts (or items boxed with or associated manuscripts) should be modelled, it would also be specifically desirable to separate physical elements which represent authentication (the physical seal) and the authenticating value of said elements (the validity and force of law associated with the seal); these differing levels of significance are important to separate in charters which mix both authentic and inauthentic elements, such as the Privilegium Maius (a forged document with repurposed valid seals, establishing the "Archduchy" of Austria). Description of an element that is used to authenticate a document. Specifically, elements that would be intended, from a juridical perspective, to be authenticating. May contain a forged element, in which case the forgery should be noted and certainty marked. The closed list of type values is based upon the highest level of a SKOS typology of authenticating elements (CEI2TEI/Authentication/authen.skos.ttl) I made in order to use as subtype pointers, to provide more detailed types than I cared to put in a typed list. In its present form, it is not specific enough to cover all instances, such a modern UK royal charters, where the complex of seal colour, thread material, and form of the seal have specific values, but I am not clear that this solution would fit the base TEI in any case, and there are no large projects that I know of that use these specific subtypes that would be inconvenienced by the removal.

authDesc

As seal is semantically changed and generalized to authen, so sealDesc is semantically changed and generalized to authDesc, to represent the larger range of materials being discussed.

legalActor

Persons or organizations party to or otherwise mentioned in a an act or contract. legalActor separates the role a party to a document is playing in a document from the other aspects of that person which would be covered in person, as legal persons will appear in different documents with different roles. Closed list for @type in the ODD includes: issuer, recipient, beneficiary, witness, notary, promisor, promisee, intervenor, other, sigillant, third. This list would likely cover most parties to modern legal documents as well, but that is outside of the project's expertise.

diploPart

Charters are highly formulaic documents, with set clauses which are used as part of a set formula for a type of document in a specific issuer's office. They are highly interesting for the understanding of the historical practices, but they also are described slightly differently by different systems. diploPart represents these elements so that they can be compared across documents. The only project that marks up these materials in a consistent way that I am aware of at present is the DEEDS Project (Michael Gervers, University of Toronto), the proceeds of which have been ingested into Monasterium. Other projects like AlinaOs/Structurally_Annotated_Medieval_Charters use the annotation from the CEI. An element identifying the various conventional elements of documentary instruments. This refers to the intellectual content of the text; generic solutions should be used if the needs of an individual project require devisions on set phrases or other segments of text. The list contains and may be extended with clauses which overlap in their meanings, to meet the specific legal context of the act. Using the existing categories, where possible, increases the cross-comparative functionality of the total corpus, and is encouraged. The closed list in the CEI2TEI ODD represents the the major systems of organizing these, making no judgement as to which one is correct (where multiple names are attached to one consistent element, as in the case of arenga/exordium/preamble/proem, I have chosen one and noted the others in the desc). Any project attempting to compare documents would need to account for which system is used. The traditional names used in the field of diplomatic(s) are used, even when there are semantically-identical English equivalents: protocol, contextus, eschatocol, apprecatio, arenga, benedictio, clausulae, corroboratio, datatio, dispositio, inscriptio, intercessio, intitulatio, invocatio, narratio, notificatio, rogatio, publicatio, salutatio, sanctio, subscriptio, other.

copyStatus

Since charters exist in multiple copies, possibly with differences at different stages of drafting, issuance, promulgation, and repromulgation, a core interest of charters markup is to relate these different types of copies of charters to each other, going beyond what is possible with tei:filiation. The status of the document as an original or a copy. Equivalent to the former CEI traditioForm. Can occur multiple times in different msItems, to represent, for example, a copy of a document which is in an original of a promulgation of the first document. Current values: original, draft, copy, forged (I am not sure this ontologically fits, but it is complex), unknown, other. Promulgation and repromulgation copies are copy but generally with additional legalActors involved.

issued

The date when an act was legally issued, rather than the original date when the document was made. Needed for searching the date of the act rather than the date of a specific copy, and vice-versa. Official date and/or place of issue for the act.

legend

A text-bearing element in heraldry or a seal. Very commonly recorded in text describing seals or documents. Text written on a seal or as part of a illustration, such as in the heraldry element. I have made the case elsewhere that seal should be reconceived so that we can describe the contents of the faces of a seal as easily as the pages of a leaf, and others have suggested the need to have full physDesc elements. This also ties into the fact that seals are conceptually separate objects which have their own cataloguing that are attached to documents as part of production and need their own expanded level of description, over and above what is currently available. legend could be part of that reconsideration or obviated by the new model.

plica

Extremely charters-specific: some archives and charter traditions record information about a fold in the charter.

Changes to note

This will probably have to remain a project-specific change, given the wide use of note, but given the interests of marking up charters, a simple typology of note @types is included in the ODD, which I will mention here: production (draft notes), ownership (dorsal notes on charters, marks of sale, etc.), personal (marks made by end users for the benefit of themselves), impersonal (marks made by end users for the benefit of others), structural (indicies, etc.), other.

Other elements of the ODD

The ODD contains minor changes in types meant to make data validate in a structure preferred by the CEI conversion, but since the Technical Council members are more expert than myself, I would leave it to them to suggest whether these aspects are properly implemented.

sydb commented 1 year ago

The big question will, of course, be whether or not to fold this into the Guidelines themselves (accepting the ticket) or not. My first instinct is yes, we should (but I may prove wrong on that), despite the fact that the main argument OP presents in favor (“in order to have the largest source of TEI-encoded charters also be a compliant example of encoding”) is at best lame, and at worst wrong. (In some ways the TEI community wants large datasets of well encoded documents out there in the world to be using well-designed customizations like this. It helps prove the point that TEI is intended to be customized, and provides an example of doing so.)

My second thought is that we should not start trying to create a new chapter until CMC is done. 😐

larkvi commented 1 year ago

@sydb There seems to be a desire in the digital diplomatics community, from what I have heard at conferences, for there to be an "official" standard. It is also the case that people approaching TEI for the first time are likely to take the TEI-C website guidelines as the gospel about what is to be done, and we are aware of other projects, made by people not aware of the CEI, that have used basic TEI encoding for their charters, with all of the limitations that go along with that.

Also, I don't quite follow the general reasoning, since the reason that we have almost every module in the TEI is because it served project needs at some point which drove the TEI into new areas, and I don't know why that should be true of Edwardian correspondence and medieval codices, but not charters and related materials.

GVogeler commented 1 year ago

Before this discussion moves into mainly inclusion/exclusion arguments and/or practical things, I would like to point out that:

  1. the proposal might be split up into amendments of current models (legend as extension of sealDesc or - as inscription - as an extension of the object model -, copyStatus as an extension of filiation, legalActor as an addition to other responsibilities) and
  2. a section in "authenticity" in the teiHeader chapter of the guidelines (authen, authDesc, issued that goes beyond the diplomatics community (considering all officially issued and signed documents).