Orthographic description is inconsistent

Anisava writes:

I am a minimalist concerning the description. In my opinion as we input the less data -- the better. I understand the need of the differentiation between MS with regular juses and jers and others, but for the late manuscripts it is impossible to say in details what is the exact definition of because texts are from the different sources, and depends from their sources. I prefer summary and very short description. May be it is not convenient for indices, but it is more useful for late MSS.

She is responding to Andrej's earlier message:

You are quite right that this situation is complicated. We have the following description in AMAdd39628BBL.xml:

<scribeLang>
    <orthography>
        <p>Old Church Slavonic.</p>
        <p>One-<emph>jus</emph> (<foreign xml:lang="cu">ѧ</foreign>) and
            one-<emph>jer</emph> (<foreign xml:lang="cu">ь</foreign>); sporadical usage of the so-called "middle jus".
            Confusion of <foreign xml:lang="cu">и</foreign> and <foreign xml:lang="cu">ы</foreign>, regular initial and post-vocalic
            <foreign xml:lang="cu">ѥ</foreign> and <foreign xml:lang="cu">ꙗ</foreign> (see <ref type="bibl" target="bib:Vakareliyska2008">Vakareliyska 2008</ref>. </p>
    </orthography>
    <lexis>In the synaxarion the Slavonic names of the months are used
        beside the Greek ones.</lexis>
</scribeLang>

First attempt:

<scribeLang>
    <summary>Old Church Slavonic</summary>
    <orthography>
        <p>One-<emph>jus</emph> (<foreign xml:lang="cu">ѧ</foreign>) and
            one-<emph>jer</emph> (<foreign xml:lang="cu">ь</foreign>); sporadical usage of the so-called "middle jus".
            Confusion of <foreign xml:lang="cu">и</foreign> and <foreign xml:lang="cu">ы</foreign>, regular initial and post-vocalic
            <foreign xml:lang="cu">ѥ</foreign> and <foreign xml:lang="cu">ꙗ</foreign> (see <ref type="bibl" target="bib:Vakareliyska2008">Vakareliyska 2008</ref>. </p>
    </orthography>
    <lexis>In the synaxarion the Slavonic names of the months are used
        beside the Greek ones.</lexis>
</scribeLang>

It is quite obvious that Old Church Slavonic in this case is just a summary. The other question is how to relate this summary with other descriptions (cf. below). According to our Guidelines, the rest of the description should be divided into several paragraphs. If we will be using an element the description will be something like:

<scribeLang>
    <summary>Old Church Slavonic</summary>
    <langNote type="jer" subtype="front">One-jer</langNote>
    <langNote type="jus" subtype="nonEtymReg">One-jus (<foreign xml:lang="cu">ѧ</foreign>). Sporadical usage of the so-called "middle jus".</langNote>
    <langNote type="jotVowel">Regular initial and post-vocalic <foreign xml:lang="cu">ѥ</foreign> and <foreign xml:lang="cu">ꙗ</foreign></langNote>
    <langNote type="otherLetters">Confusion of <foreign xml:lang="cu">и</foreign> and <foreign xml:lang="cu">ы</foreign></langNote>
    <langNote type="lexis">In the synaxarion the Slavonic names of the months are used beside the Greek ones.</langNote>
    <ref type="bibl" target="bib:Vakareliyska2008">Vakareliyska 2008</ref>
</scribeLang>

The problem is that in most cases we have for the description of orthography/language something like:

Without juses, with two jers, irregular; West Bulgarian dialect features

So, where should this statement go? Now it is encoded as (AM82NIK.xml):

<scribeLang>
      <orthography>
          <p xmlns="http://www.tei-c.org/ns/1.0">Without juses, with two jers, irregular; West Bulgarian dialect
              features</p>
      </orthography>
  </scribeLang>

First variant:

<scribeLang>
    <summary>Without juses, with two jers, irregular; West Bulgarian dialect features</summary>
</scribeLang>

This variant doesn't go well with Old Church Slavonic above, or we can replace Old Church Slavonic in the description of AMAdd39628BBL.xml with One-jus, One-jer orthography in the <summary>. Second variant. We will not use <summary> but something like

<langNote type="general">Without juses, with two jers, irregular; West Bulgarian dialect features</langNote>

Then summary in this context, if we need it, will be just something like a free prose. Or we can make this:

<scribeLang>
    <summary>West Bulgarian dialect features</summary>
<langNote type="general">Without juses, with two jers, irregular</langNote>
</scribeLang>

Then <summary> will be in accordance with Old Church Slavonic and will refer only to language, not to orthography. The most complicated approach will be something like:

<scribeLang>
    <summary>West Bulgarian dialect features</summary>
<langNote type="jus" subtype="nonJus">Without juses</langNote>
<langNote type="jer" subtype="nonEtymReg">With two jers, irregular</langNote>
</scribeLang>

Then when you have With juses, with two jers, irregular it will be encoded just as:

<langNote type="jus" subtype="nonJus">Without juses</langNote>
<langNote type="jer" subtype="nonEtymReg">With two jers, irregular</langNote>

without <summary>

Without juses, with two jers, irregular; Resavian orthography (school in most of the descriptions):

<scribeLang>
    <summary>Resavian orthography</summary> 
<langNote type="jus" subtype="nonJus">Without juses</langNote>
<langNote type="jer" subtype="nonEtymReg">With two jers, irregular</langNote>
</scribeLang>

I like the last one. Hm. How to proceed?

@atoboy How about if we allow <langNote> to be inline, but optional? That means that we could write either:

<scribeLang>
    <summary>Resavian orthography</summary> 
    <p>Without juses, with two jers, irregular; West Bulgarian dialect features</p>
</scribeLang>

<scribeLang>
    <summary>Resavian orthography</summary> 
    <p><langNote type="jus" subtype="nonJus">Without juses</langNote>, 
      <langNote type="jus" subtype="nonJus">with two jers, irregular</langNote>; 
      West Bulgarian dialect features</p>
</scribeLang>

The textual content of these two descriptions is identical, and they would be rendered the same way in the codicological-description view. The second version, with the inline <langNote> elements, allows us to look for manuscripts according to their conventions for writing jers, juses, and other features. But where we don't have that information available, or where the situation is too complicated for that type of markup to be useful (such as the later manuscripts that Anisava mentions), it won't be required.

@djbpitt The problem, as you can see, is that we have several variants. I don't want to loose the information about language and orthography, but somehow we should make a classification of possible cases.

We have something like: Without juses, with two jers, inconsistently used. I think it could be encoded simply as:

<scribeLang>
  <summary xmlns="http://www.tei-c.org/ns/1.0">Without juses, with two jers,
                            inconsistently used</summary>
</scribeLang>

Consider this example:

<scribeLang>
         <summary xmlns="http://www.tei-c.org/ns/1.0">Old Church Slavonic</summary>
         <langNote>One-jer (<foreign xmlns="http://www.tei-c.org/ns/1.0" xml:lang="cu">ь</foreign>), 
non-jus orthography. There is very clear tendency to separate words.</langNote>
</scribeLang>

I would change this to something like:

<scribeLang>
                        <summary xmlns="http://www.tei-c.org/ns/1.0">One-jer (<foreign xmlns="http://www.tei-c.org/ns/1.0" xml:lang="cu">ь</foreign>), non-jus orthography. There is very clear tendency to separate words.</summary>
                    </scribeLang>

The reason is that Old Church Slavonic in this context gives us not much information, unlike Resavian orthography above, and Church Slavonic for the later MSS.

The other way is not to use summary at all, when you have just "Without juses, with two jers, inconsistently used". Then it will be encoded simply as

 <scribeLang>
                            <p xmlns="http://www.tei-c.org/ns/1.0">One-jer (<foreign xmlns="http://www.tei-c.org/ns/1.0" xml:lang="cu">ь</foreign>), non-jus orthography. There is very clear tendency to separate words.</p>
                        </scribeLang>

This means unstructured information. If we would like to include <langNote> as part of , then we maybe should change the declaration of  or to include <langNote> as phrase element?

So, I would rather suggest the following variants:

Unstructured information, type Without juses, with two jers, inconsistently used – when you have just this information -- use just  (or maybe <summary?) as child of scribeLang.
You have information like Resavian orthography. Without juses, with two jers, irregular; West Bulgarian dialect features. Encode this as:

<scribeLang>
    <summary>Resavian orthography</summary> 
    <p>Without juses, with two jers, irregular; West Bulgarian dialect features</p>
</scribeLang>

Or maybe West Bulgarian features in this case should go to <summary>?

You have some general information like Resavian orthography, or Church Slavonic and then you have a list of linguistic features, like in our descriptions of Codex Marianus or Codex Suprasliensis. Then use the model:

<scribeLang>
    <summary></summary> 
    <langNote></langNote>
  <langNote></langNote>
...
</scribeLang>

with some attributes for <langNote>. To be consistent maybe we should always use <summary> and then we could have two variants: unstructured  or structured <langNote> description?

@atoboy You suggest above that:

If we would like to include <langNote> as part of , then we maybe should change the declaration of  or to include <langNote> as phrase element?

I agree that should allow <langNote> as a child of , but only when the  is a child of <scribeLang>. If we change the definition of  or add <langNote> as a phrase element, it would allowed inside any  element anywhere. and I don’t think we want to do that. This issue is a challenge with the TEI ODD method: the TEI makes it easy to add another content item to the model for , but it makes it difficult to say that  elements in different contexts have a lot of the same content, but also idiosyncratic content not shared with  elements elsewhere. I think, nonetheless, that that’s what we want to say.

My recommendation, then, is that <summary> should always be required and  should be optional and repeatable, so that we could include whatever prose description we want of the orthography. But that . (although not  outside the <scribeLang> context) would allow <langNote> children alongside all of the other content that may occur inside  anywhere.

@djbpitt

Obviously, I'm doing something wrong (ODD file):

<elementSpec ident="scribeLang" ns="http://www.ilit.bas.bg/repertorium/ns/3.0" mode="add">
            <desc>contains a description of the orthography and language of the scribe</desc>
            <classes mode="change">
              <memberOf key="att.global"/>
              <memberOf key="att.typed"/>
            </classes>
            <content>
              <alternate>
                <classRef key="model.pLike" minOccurs="0" maxOccurs="unbounded"/>
                <sequence>
                  <elementRef key="summary" minOccurs="0" maxOccurs="1"/>
                  <elementRef key="langNote" minOccurs="0" maxOccurs="unbounded"/>
                  <classRef key="model.pLike" include="langNote" maxOccurs="unbounded"/>
                </sequence>
              </alternate>
            </content>
...
</elementSpec>

There is no <langNote> inside  in this case.

@atoboy You have a lot more ODD experience than I do, so I don't have a lot of confidence in my ability to help, but I'll try:

I think the way to allow a  child of <scribeLang> to contain all content items of a regular  plus zero or more <langNote> elements, all in any order, is to create a new <macroSpec> and use that as the content of <scribeLang>. See https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-macroSpec.html. If that pointer doesn't help, please let me know and I'll see whether I can figure it out in any more detail.

I agreed to have a

into if it is possible. The problem is that is sources for the contents are several, the result is that the orthography depends on these sources, and it is possible to be different. For ex. in one text we have juses, or only big jus occasionally -- but in other no juses. How to proceed?

@miltenova

We have two levels of language description. One is part of the element <scribe>. The idea here is to describe the common features of the scribe's language/orthography. The other level is part of the <msItemStruct> – the element <textLang>. Here the function of this element is to describe the language of the text, especially the attitude of the language to the history of the text. For example, there may be differences in the language of a particular text, unlike other texts in the manuscript – in spelling, grammar or structural terms. The same is valid also for the <decoDesc> element. The decoration of the particular text could be different than the overall style of the manuscript. Somehow it could give us a clue to a possible different origin of the text.

djbpitt / repertorium

Orthographic description is inconsistent #8