encoding of letterheads

TEIC / TEI

The Text Encoding Initiative Guidelines

https://www.tei-c.org

Other

276 stars 88 forks source link

encoding of letterheads #2457

Open sabineseifert opened 1 year ago

sabineseifert commented 1 year ago

Amongst encoders of correspondence, there is a need for clear recommendations concerning the encoding of letterheads which are missing in the Guidelines.

A discussion at a workshop on encoding correspondence resulted in the article "Pre-printed parts: Letterheads and forms", in which serveral possibilities are being discussed (with lots of examples) and to which I will refer below. It also takes into account the discussion of letterheads at the TEI-L Mailinglist from 2017.

So what encoding of letterheads should be recommended? Usually they contain the name and address of sender, maybe a company emblem, a family crest, images, or decorative symbols. They can appear at the top as well as at the bottom of the first/of all page(s).

Using <fw> https://encoding-correspondence.bbaw.de/v1/pre-printed-parts.html#c-2-1
- Is that suitable for letterheads and 'letterfooters' or an overexpansion of the definition/a misuse?
- otherwise, <fw> is quite flexible and allows names, places, and addresses to be marked up with <address>, <name>, <placeName> etc., also <figure> can also be included to capture images that are part of the letterhead
- <fw> is an inline element, is that problematic? letterheads in their layout are usually set apart from the rest of the text
Using <head> https://encoding-correspondence.bbaw.de/v1/pre-printed-parts.html#c-2-2
- printed letterheads can be positioned at the top of the letter, at the bottom, on the left etc.
- <head> cannot capture letterheads at other places than the top of the page
Using <div>, <seg> or <ab> https://encoding-correspondence.bbaw.de/v1/pre-printed-parts.html#c-2-3
- e.g. with attributes type="letterhead", type="letterfooter", type="pre-printed"
- <div> and <ab> not allowed before/after <opener>/<closer>
- <seg> is inline element and cannot be used outside <div>
- problem with recurring letterheads and/or letterfooters on each page
Using <figure> https://encoding-correspondence.bbaw.de/v1/pre-printed-parts.html#c-2-4
- tag abuse?
- there is not always an image in a letterhead/letterfooter
Capturing information in <teiHeader> with <layoutDesc> https://encoding-correspondence.bbaw.de/v1/pre-printed-parts.html#c-2-5
- encoding examples and linking between header and body

Recommendations, encoding examples, and some prose description in the Guidelines would be great!

sydb commented 1 year ago

A thought occurs to me. Sometimes the stuff that is pre-printed is a heading (<head>), a pagination apparatus (<fw>), a chunk of stuff (<ab> or <div>), or something else. But the point is, regardless of how it fits into the hierarchy of the encoded document, the source of the writing is not the author of the letter, but rather the publisher of the stationary.

The Guidelines already have (at least) two mechanisms for recording that the source of one part of a document is different than the rest:

@decls, an attribute that points to the metadata element (in this case <sourceDesc>?) that applies to the element on which it appears.
@source, a global attribute intended (when not used on an ODD element) to be used to refer to a bibbliographic citation for the external source from which some aspect of the element is drawn; originally intended for quotations and the like.

I do not think @decls is general-purpose enough to handle this. (It is not allowed on <fw> nor <opener>, for example. TEI Council could change that, but at least for now I do not think it is viable.)

But the global @source would probably do. Something like the following partially fictional example. (The letter is real. In truth the part I am attributing to the HWSOL & Co. printers was also written by me, but might just as well have been pre-printed.) Caden_Brown_01_redacted

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>letter to Caden Brown, 2014-08-27</title>
        <author>Syd Bauman</author>
      </titleStmt>
      <publicationStmt>
        <p>published on TEI issue #2457.</p>
      </publicationStmt>
      <sourceDesc>
        <bibl xml:id="src_pre-printed">
          <orgName>HWSOL &amp; Co.</orgName>
          <address>
            <addrLine>13 Brown Street</addrLine>
            <addrLine>Providencem, RI 02912</addrLine>
          </address>
          <idno type="telephone">+1 401-863-3671</idno>
        </bibl>
        <bibl xml:id="src_authored">
          <title>Caden_Brown_01.odt</title>
          <author>Syd Bauman</author>
          <date when="2014-08-27"/>
        </bibl>
      </sourceDesc>
    </fileDesc>
    <profileDesc>
      <correspDesc>
        <correspAction type="written" subtype="started">
          <date when="2014-08-26"/>
        </correspAction>
        <correspAction type="written" subtype="finished">
          <date when="2014-08-27"/>
        </correspAction>
        <correspAction type="sent">
          <date when="2014-08-28"/>
        </correspAction>
        <correspAction type="received">
          <date when="2014-08-29" cert="medium"/>
        </correspAction>
      </correspDesc>
    </profileDesc>
  </teiHeader>
  <text>
    <body source="#src_authored">
      <opener>
        <figure source="#src_pre-printed">
          <graphic url="https://en.wikipedia.org/wiki/Rod_of_Asclepius#/media/File:Star_of_life2.svg"/>
          <figDesc>A star of life, ~½ inch in diameter, within which the Rod of Asclepius has a
            light blue staff and a white serpent with a very slight smile.</figDesc>
          <!-- In case you, the reader of this TEI ticket, are curious, the description mathces the
               star of life that is actually on my letter, but the graphic is of a different star
               of life. -->
        </figure>
        <figure source="#src_pre-printed">
          <ab style="align: center; font-size: large; vertical-align: text-top;">Syd Bauman</ab>
          <figDesc>A tool line, ~2 pt wide, ~7" long, running from the star of life on the left to the right margin</figDesc>
          <ab style="align: right; font-size: x-small; vertical-align: text-bottom;">
            <gap reason="redacted" extent="precise street address"/>
            / Rehoboth MA  02769-<gap reason="redacted" quantity="4" unit="char"/>
          </ab>
        </figure>
        <salute source="#src_authored">Dear Caden &amp; family —</salute>
      </opener>
      <p>Thank you for taking care of our bunnies!
        <name type="rabbit">Curious</name>,
        <name type="rabbit">Maple</name>, and
        <name type="rabbit">Lola</name> appreciate your
      <lb/>attention. Hope you have a great rest of summer.</p>
      <closer>
        <salute>Sincerely,</salute>
        <signed>Syd Bauman</signed>
        <seg type="credit">
          <seg type="dictated_by">SB</seg>:<seg type="typed_by">sb</seg>
        </seg>
        <!-- Hey, corresp-SIGgers — how is the little “SB:sb” at the bottom *supposed* to be encoded? -->
      </closer>
    </body>
  </text>
</TEI>

laurentromary commented 1 year ago

The case is conceptually similar to that of questionnaires (e.g. in sociology) where there is a combination of fixed and variable content. there have been several discussions on this topic over the years and it would be great to have clear guidelines on these kind of documents.

sabineseifert commented 11 months ago

Thank you for your encoding example, @sydb! It would be great to have one recommended way of encoding letterheads. The problem with <figure> is that it is tag abuse when using it for letterheads containing just words and not any images. E.g. example 3: https://encoding-correspondence.bbaw.de/v1/pre-printed-parts.html#19.

lb42 commented 11 months ago

I agree that figure usually implies the presence of a graphic element but you might also use it to encode say a table with no graphic content. So using it for preprinted forms doesnt seem problematic. Using fw seems to better convey the semantics of a preprinted letterhead however. The problem of how to handle non printed additions to the letterhead is much the same in either case.

sydb commented 11 months ago

Well, yes, @sabineseifert, but my point was not that pre-printed letterheads should be encoded with <figure>, but rather that whatever is used to encode the pre-printed part, it be distinguished from the rest by use of the @source attribute. (In fact it would probably be better to encode the address as an <address>, not an <ab>, and to use 2 separate <opener> elements. But those are just details.)

For every node (particularly text nodes) inside the <body>, the path ancestor::*[@source][1]/@source returns either "#src_authored" or "#src_pre-printed".

sabineseifert commented 10 months ago

@peterstadler and I had a longer discussion about this: We think it reasonable to consider all three things together:

letterheads as special case of pre-printed parts (this ticket),
(smaller) pre-printed parts (ticket #2458), and
bigger structures of pre-printed parts like forms, questionnaires etc. (with blank spaces to fill out, with checkboxes etc.)

(and not to have different solutions for each).

We had the idea of a new wrapper element for all three cases that defines and contains what is pre-printed. Everything within this wrapper element that deviates from this pre-printed stuff (and only that) needs to be clarified as handwritten with e.g. @hand.

It seems important to keep the distinction between the not unique pre-printed parts of some authority/institution (that can occur serveral times across several unique objects) and the unique handwritten parts that belong to the text of the author = main hand of the document (in FRBR terms manifestation and item). It is probably not easy or convenient to capture this distinction with @source / <bibl> (e.g. in @sydb's example above).

The idea is to have a little working group meeting of 3-4 people and to give it a try for all three aspects letterheads, pre-printed parts, and forms/questionnaires. Does anybody want to join us? :-)

ebeshero commented 10 months ago

@sabineseifert I would love to join your group on this topic! My students and I worked on a project to encode a handwritten response to a printed survey form, in which we needed for semantic reasons to connect the printed form questions with responses, while definitively separating the print survey from the hand writer. One area of interest was the extent to which the writer was rebelling or pushing back against the rigid constraints of the survey form, as she overflowed the printed boxed areas frequently. One of her extremely long responses was excerpted from the survey and published separately as an essay--without attention to the original context of the form that provoked it! I had an idea to try to come up with a metric to measure how much the survey completed exceeded the limits imposed by the form!

Anyway I am super interested in this--please count me in!

ebeshero commented 8 months ago

Council subgroup of @sabineseifert, @ebeshero, and @joeytakeda with @peterstadler discussed some of the issues here as potentially broader than encoding letterheads, looking at examples from our projects. We are seeking a broader solution for pre-printed documents with the option of containing "post-filled" content. We wonder if we can create a general solution for preprinted documents that contain "fields" calling for a response, like for address fields, or printed lines on a survey form or form ledger.

The TEI currently does not have elements specifically for this purpose. We considered these options:

Feature Structures (not a good solution for most cases because it's best for organizing aggregated data)
<ab> to hold printed form "questions", and setting <add hand="writer"> inside for showing the writer filling out a form area. This is maybe not the best solution because <add> as defined by the TEI is really/usually for insertions and corrections, and not for what we'd think of as the "main" or "base" text.
<fw>:
- We thought it could work for traditional <fw> situations with printed running heads as well as letterheads.
- And perhaps it could be expanded to include elements that contain meaningful form field and response information.
- HOWEVER, there's a problem: As @sabineseifert points out, <fw> nests inside <p> and <ab> and it's not supposed to precede an <opener> or <closer> on correspondence encoding. We can get around this by wrapping <opener> and <closer> in a <div> element, but perhaps it's not ideal.
Perhaps we need a new element that is intended for a wider variety of uses than <fw>, whose contents is not intended to be marginal but really main/central text.

Here is a suggested structure for a form requesting an address, something like what we've seen printed on the outside of a postcard, but could ALSO be readily adapted for ANY pre-printed and post-filled form material: The emphasis here is on capturing the semantic distinction between pre-printed material and post-filled content with clearly distinguished elements in a simple structure, rather than encoding this in a surface/zone style encoding.

We are not 100% certain, but we are wondering if we can create a simple solution that will work for letterheads as well as pre-printed forms in #2458 .

We are not sure whether we need to introduce new TEI elements to handle this, but the more we poke at the semantics of <fw>, <ab>, and <add>, the more we think maybe we had better introduce something new for the different semantics of

<formField>
       <fq><!--form question --> Street Address
             <fr hand="writer"><!--form response -->1234 Baseline Avenue</fr>
        </fq>
         <fq><!--form question --> City
             <fr hand="writer"><!--form response -->Atlantis </fr>
        </fq>

</formField>

This same solution can work for a survey document / questionnaire like this:

<formField>
       <fq><!--form question --> Educational institutions attended?
             <fr hand="writer"><!--form response --><orgName>TEI University</orgName>,
                          <orgName>Humanities Digital Institute</orgName> 
              </fr>
        </fq>
         <fq><!--form question --> Major field of study:
             <fr hand="writer"><!--form response -->ODD processing </fr>
        </fq>

</formField>

As we have seen with some pre-printed content on postcards, letterhead can contain form field pre-printed content like this. (Let's add our examples here.)

Thoughts?

joeytakeda commented 8 months ago

Thanks for this summary, @ebeshero . However, in chewing on this over the last ~24hours, I'm reconsidering whether it's necessary for there to be a single solution for both pre-printed letterheads and forms, especially since forms (aka inputs aka "blanks") may come in all varieties of places (e.g. in a contract: "Where, I, ___, agree to..."). I think they are certainly related phenomena, but I don't know if they're precisely the same thing...

Re: Pre-printed materials (this ticket)

For preprinted materials (such as letterheads), I think <fw> is appropriate given that <fw> is meant to capture "other material repeated from page to page, which falls outside the stream of the text" (11.6). However, I think @sabineseifert 's initial concern re: <fw> being an "inline element" could be addressed by redefining <fw> such that it is something like <note> (e.g. macro.specialPara), which is similarly defined as "any additional comment found in a text, marked in some way as being out of the main textual stream" (3.9.1).

Re: Smaller preprinted parts and forms (#2458)

I'm not totally convinced that <fw> is that right element for that, since, in many cases, the forms and inputs aren't "outside the stream of the text": in fact, in many cases, they serve as the entirety of the text itself. Of course, there are many cases where the blanks are indeed outside of the stream of text (e.g. when it's in a letterhead), but I don't think that represent the same phenomenon.

That said, I would say that smaller pre-printed parts (as discussing in #2458) are part of the broader phenomena of forms and inputs—which I think could come in a variety of forms (blank spaces, input lists [radio, checkbox]) and may have a wide set of responses.

There's more to consider here, but I'm wondering if we could marshal some existing elements (namely, <space>, <list>, and <label>) for some of this?

Considering the following segment from the Custodian Case File for Joji Takeda[^1]: Screenshot 2024-01-17 at 3 20 01 PM . Bracketing (for now) the question of denoting hand/type shifts, I could imagine a encoding scheme that:

Wraps each "form" question in a wrapper (let's call it <input> for now, since <form> is already taken)
Uses <label> for the label of the prompt
Uses <space> with a nested <add> to denote the "filling in" of a form
Describes lists of options using <list> and <item> and the @select attribute for identifying selections

<div>
    <head>INFORMATION FROM R.C.M.P.</head>
    <lb/>
    <dateline>
        <input>
            <label>Date</label>
            <space>
                <add><date when="1943-02-19">Feb. 19/43</date></add>
            </space>
        </input>
    </dateline>
    <lb/><input>
            <label>Our File No.</label>
            <add place="above">✓</add>
            <space extent="quarter width"><add>3111</add></space>
        </input>
    <lb/><input>
            <label>Full Name</label>
            <space extent="full page">
                <add><persName>TAKEDA, Joji</persName></add>
            </space>
            <note place="middle below">(Surname in Block Letters)</note>
        </input>
    <lb/><input>
            <label>Registration No.</label>
            <space>
                <add>12840</add>
            </space>
        </input>
        <input>
            <list type="check" select="#m">
                <item xml:id="m"><add place="above">✓</add>Male</item>
                <item xml:id="f">Female</item>
            </list>
            <note place="middle below">(check)</note>
        </input>
        <input>
            <label>Age</label>
            <space>
                <add>
                    <date when="1923-04-24">Apr. 24, 1923</date>
                </add>
            </space>
        </input>
</div>

In terms of denoting shifts in hands/type: I think a <typeShift> (per #2458) makes sense, but would be laborious in a document like this. I'm curious whether it would be useful to go the other way and make <handNote> a member of att.scoping so that one could do something like: <handNote xml:id="custodian" match="add"/> to specify all of the <add>s, by default, are from that hand?

——— [^1]: From Landscapes of Injustice: https://loi.uvic.ca/archive/C-9333_3111.html?ref=take190

ebeshero commented 8 months ago

@joeytakeda Thanks for explicating with this interesting example. Your use of <add> is similar to the encoding my students and I developed for our Anna Julia Cooper survey project:


  <div2 type="question" n="65">
               <ab rend="4-L">65. Have you a <q>racial philosophy</q> that can be briefly stated?
               <add hand="AJC">My <q>racial philosophy</q> is not far removed 
                  <lb/>from my general philosophy of life: that the greatest happiness comes from
                  altruistic service—&amp; this <lb/>is in reach of all of whatever race &amp; condition. The <q>Service</q> here meant
                  it is not a pious idea of <hi rend="underline">being used;</hi> 
                  <lb/>any sort of exploitation whether active or passive
                  is to my mind hateful. Nor is the <q>Happiness</q> a mere bit.<note resp="#ebb">This continues on the last page.</note></add>
               </ab>
            </div2>

Here though, on the use of <add> I thought our subgroup agreed that it was not ideal because this element is typically used for small insertions or corrections, and not for long passages of what we consider main text.

I also think that <label> is not generally for a prompt that raises a question to be answered.

Indeed a form seems a sort of "call and response" genre. I have sometimes wanted to reach for tagging for dialogue to encode a pre-printed question and handwritten response! But no--a form is not spoken, but printed/written in two or more hands.

We want encoding for prompt and response that elevates both to equivalent significance as main text, I think. I recall that was an important finding of our conversation, yes?

joeytakeda commented 8 months ago

Here though, on the use of <add> I thought our subgroup agreed that it was not ideal because this element is typically used for small insertions or corrections, and not for long passages of what we consider main text.

I also think that <label> is not generally for a prompt that raises a question to be answered.

Indeed a form seems a sort of "call and response" genre. I have sometimes wanted to reach for tagging for dialogue to encode a pre-printed question and handwritten response! But no--a form is not spoken, but printed/written in two or more hands.

We want encoding for prompt and response that elevates both to equivalent significance as main text, I think. I recall that was an important finding of our conversation, yes?

OTOH, the more I think about <add> the more I think it may be OK; thinking the other way, the addition of a marginal note (for instance) is sort of like a response to a question that the original author didn't ask. So perhaps <add> doesn't necessarily need to be about a kind of layering atop a text, but we always think about <add> as a kind of dialogic interaction (rather than overwriting).

But on the other hand, point totally taken about long passages (since <add> can't contain paragraphs, for instance)...

Really my main concern here is that sometimes blanks appear in a really structured way (e.g. like in a form), but then blanks may appear in other places without the kind of question/response structure. I think not all forms or inputs are necessarily calls and responses; sometimes, inputs don't have labels or an explicit question—they just exist as part of the document waiting to have content included—e.g. this letter that I saw making its rounds on social media the other day:

In my mind, it's important not only to outline that there is a response, but also to qualify that there is some sort of space itself.[^1] In other words, we should be able to encode both the existence of the blank and, separately, the existence of the response, which is why I'm favouring an approach that treats the question, the "input" itself (e.g. the _____, the radiobox, the checklist, etc), and the response to that question separately.

One other way to think of this might be as a kind of <gap> or <ellipsis>, etc (e.g. those things defined in https://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#COEDADD); but it's neither a <gap> (since it's not something missing from the document) nor, I'd argue, an <ellipsis> (since it's not something purposefully omitted from the document, but rather than something that, to borrow from @ebeshero's description, invites a response). So maybe instead of <space> (which I admit is probably a bit of tag abuse), we may want to have another element—<blank>?

I can imagine a few possibilities:

A new element (say <value>) that is nested in <blank> that would contain the written (or typed) in response
Or a kind of Janus element——which would contain a blank and a value:

<input>
   <blank extent="3 pages"/>
   <value extent="1 page">
    <p>Since the dawn of time, ...</p>
   </value>
</input>

The nice thing with this structure, too, is that it could allow for something like a <list> with perhaps an easier selection mechanism:

<input>
   <list type="checkbox">
      <item xml:id="keats" rend="checked">Keats</item>
      <item xml:id="shelley" rend="unchecked">Shelley</item>
      <item xml:id="byron" rend="unchecked">Byron</item>
   </list>
   <value select="#keats"/>
</input>

I think this could then be paired with whatever kind of labelling mechanism you need for describing your text:

<list>
<label>Date</label>
<item>
<input>
   <blank/>
   <value>
        <date when="1943-02-19">Feb. 19/43</date>
   </value>
</input>
</item>
<label>Our File No.</label>
<item>
   <input>
       <blank/>
       <value>
            <num>3111</num>
       </value>
   </input>
</item>
</list>

Or, if we thought of the form as a table:

<table>
<row>
  <cell role="label">Date</cell>
  <cell>
      <input>
          <blank/>
           <value>
                 <date when="1943-02-19">Feb. 19/43</date>
           </value>
       </input>
   </cell>
</row>
</table>

But it also wouldn't necessarily require that there be a label if such doesn't exist in the source document:

<opener>
<salute>Dear <input><blank/><value hand="#SteveMartin"><persName>Jenny</persName></value></input>,</salute>
</opener>

(We could also consider, following HTML's form structure, allowing for either a nested <label> for the label, or recommend using @corresp for pointing to the structure that provides a label for that input?)

This is all a bit looser than what we were initially thinking, but I think that level of flexibility is probably necessary, given the wide variety of blanks and responses we might find.

What do you think?

— [^1]: I'm thinking here, for instance, of Lisa Gitelman's excellent chapter in Paper Knowledge: "A Brief History of _____" (https://www.dukeupress.edu/paper-knowledge)