TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
278 stars 88 forks source link

add @register to <layout> #1687

Closed duncdrum closed 6 years ago

duncdrum commented 7 years ago

currently \<layout> supports @columns among other things, but not @registers. Registers occur in east asian books where 2 or more text streams are presented on the same page broken in to vertical segments. See here for one example.

These are not vertically arranged columns, as mentioned here, which would break the same text stream into multiple segments. Hence I would like to suggest to add register as a possible attribute to \<layout>, following the datatype and function of columns.

Something along the lines of:

<elementSpec ident="layout" mode="change">
  <attList>
    <attDef ident="register" usage="rec" mode="add">
      <gloss>number of registers per page</gloss>
      <desc>If a single number is given, all pages have this number of registers. If multiple numbers are given, the number of registers per page varies between the values supplied.</desc>
      <datatype maxOccurs="unbounded">
        <dataRef key="teidata.count"/>
      </datatype>
    </attDef>
   </attList>
</elementSpec>
duncdrum commented 7 years ago

Any initial thoughts on this? It's kind of eerily quite :ghost:

martindholmes commented 7 years ago

I think the use-case is clear and straightforward, and I see no problem with it. I think we'd need to have a good clear description of the difference between registers and other types of page-division, in order to avoid confusing people not familiar with this layout feature. We also need to investigate whether there's anything in other manuscript or print traditions which might be covered by the same attribute.

duncdrum commented 7 years ago

how about:

@register    specifies the number of registers per page (common in East Asian xylographs). Each register contains an independent text stream

I'm happy with:

<content>
 <macroRef key="macro.specialPara"/>
</content>

But the Guidelines might include a suggestion for how <pb/>, <lb/>, … are handled. Otherwise we get:

<pb n='1'/>
To be or not to be
<lb/>
A long time ago in a galaxy far, 
<pb n="2"/>
that is the question.
<lb/>
far away ...
martindholmes commented 7 years ago

I may be misunderstanding, but it seems to me that for actual transcription, you'd either need to use a zone for each register, or we would need to invent a new break element, <rb/> or "register break".

But that makes me wonder: would it be true to say that these are actually horizontal columns? In other words, they function in the same way as vertical columns in other types of document, but they just happen to be arranged horizontally? I guess from the example above, they're not; the two text streams flow across multiple pages, right?

duncdrum commented 7 years ago

Nope horizontal columns exist, but registers are not it. Hence my desire to add something to <layout> that can describe them accurately. Horizontal columns would look like this in (vertical) xml:

<pb n='1'/>
to be or not to be
<cb/>
that is the question.
A long time ago
<pb n="2"/>
in a galaxy far, 
<cb/>
far away. …

I tend to use <div type="upper-register"> as wrapper. <rb/> would be consistent since there is <cb/> but <lb type="rb"/> is valid and works.

Since we can sensibly encode the vertical texts of registered layouts, its more a question of how beginner friendly the guidelines aim to be, by pointing out that using vanilla <pb/>, <lb/> etc without @type and wrappers is not the smartest way to encode vertical texts.

martindholmes commented 7 years ago

I'm thinking this really should be integrated into this bit of the Guidelines:

http://www.tei-c.org/release/doc/tei-p5-doc/en/html/WD.html#WDWM

or at least mentioned there. It's rather as if there are two page-breaks, one in each register. There are medieval European manuscripts that do this sort of thing with multiple commentaries running alongside the main text flow; I wonder how people normally handle that?

lb42 commented 7 years ago

Can you give a clearer definition of what you mean by "register" and perhaps suggest a synonym for it or cite an authority for that usage? It seems to be a very specific usage of the word "register", which is used quite often in the Guidelines in an entirely different sense, and it would be nice to avoid confusing people if possible. Is there perhaps a specifically east asian technical term we could use instead?

duncdrum commented 7 years ago

@lb42 Top of my head: He Yuming Home and the World 2013, p 7.

Whats not clear about my description?

Registers are vertically arranged and marked text boxes with individual text streams per page.

If you ignore text flow, it looks like horizontal columns in a vertically arranged script.

I see two occurrences of register in the guidelines

To me it seems quite clear that in the context of <layout> @register refers to neither. Its not like the guidelines point out that @column doesn't talk about marble and pillars, or marching troops. I wouldn't want to add a new word to the english dictionary just for that.

Since East Asian isn't a language here is the Chinese: @X節版 (X being a Chinese numeral). Korean and Japanese scholars would probably recognise this given the kind of sources we are talking about. @二節版 (2), and @三節版(3) would cover 90% of cases.

The problem is that only Chinese scholars can be expected to recognise this in latinised form. So, transliterating that into X-jieban makes no sense to anybody but the two of us.

I haven't encountered just 節版 (jieban) without a numeral, but in the context of <layout 節版="2"/> it might work. Not sure what the council thinks of this in light of i18n of TEI. The latinised @jieban as a tei creation would be different from @register and my second best option. less confusing to some, but meaningless to native speakers of Japanese, Korean, Vietnamese, English… .

Just @ban would not work, it can mean edition, page, printing block, depending on context.

duncdrum commented 7 years ago

@martindholmes I m happy to write a paragraph, but the section you mention talks about text direction on a single page. Which is also the difference to the medieval examples, in the majority of the cases, their script orientation matches that of an xml file horizontal lines TBLR. So far there are no general guidelines how to deal with the effects of the sources' text direction and the use of TEI tags, (like recto-verso numbering on pb, how to arrange divs, etc.). if this is something the council wants, i d be happy to contribute, but it seems a different and much bigger problem, then my request in the OP.

lb42 commented 7 years ago

Thanks for being patient with me Duncan! But I'm still not sure about this use of the word. Of course if every sinologist in the world knows straightaway what a "register" is, my concerns are more nugatory. Can you explain why this term has been adopted? I am also wondering how (or whether) I'd distinguish one of these things from the sort of feature that characterise early biblical or talmudic manuscripts, where you have a block of text vaguely in the middle of the page, surrounded by one or more flows of commentary. (Things like the Glossa Ordinaria for example). If that block isn't a "register", why isn't it? And if it is, why don't medievalists call it one? (maybe they do, in which case I should shut up at once)

martindholmes commented 7 years ago

節版 (or 二節版) doesn't appear in the Japanese jisho.org, but I think you're right that the combination of the three kanji would convey a fairly good sense of what it means (although we should ask a Japanese scholar). I quite like the idea of @jieban; we already have the gaiji module, making use of a Japanese term, and the use of a Chinese term should help to make it clear to people working on (say) European manuscripts that this is probably something they don't need to worry about. I think Lou's point about medieval MSS with flowing commentaries is a good one, but those are typically MSS rather than printed texts; I'm not aware of anything in the Western print tradition that's analogous to this (two independent flowing texts sharing the same page-sequence). Perhaps we should post on the list to see if such things do happen.

duncdrum commented 7 years ago

@lb42 always happy to respond it was too quite in here 😯 . The important point to remember is that the difference between columns and registers has a significant impact on the encoding of a text, however it is called. TEI currently has no unambiguous way of making that distinction. It is, however, widely used among people working in English on East Asian prints.

According to Oxford Dictionary of English (2010) Oxford University Press:

register |ˈrɛdʒɪstə|
...
• Printing the exact correspondence of the position of printed matter on the two sides of a leaf.

Given that traditional xylographic pages are folded half-leaves with printing on one side. Translating 二節版 as "dual-register" accurately captures the layout of the printing block imv. The page still has a front and back, despite the leave being printed on only one side.

This also explains why the term is not (?) used for manuscripts. In addition, since these examples are encoding horizontal scripts, into horizontal xml, they might have less need to distinguish. (s.b.). After all, if it looks like column… Lastly, these commentary examples all have a line-by-line relation between the text-blocks. Registers very often are completely separate. A (fictional) chapter on western literature might have Hamlet on top register, and Faust on the bottom. There is no relation between the individual lines. A hand written annotation would alternate between Denmark and Leizpig for every character/glyph.

@martindholmes I still prefer the established technical term in english, over a made up one. I ll inquire with some native speakers if they ever encountered just "jieban". Also triple registers are common for plays...

Whatever the term, I don't think that I m alone in needing one.

jamescummings commented 7 years ago

Should we point TEI-L to this issue to garner more input into what people call these (if not register)?

pcaton commented 7 years ago

The OED definition makes 'register' sound more like a state or quality than a thing, especially those prepositional phrases like "in register". I got the impression that the phenomenon you want the label for is more thing like, in the sense that a column is a thing (but I might be wrong in that).

duncdrum commented 7 years ago

@pcaton good point, I think that when we say:

<layout columns="2"/> 

we treat it as a state, i.e. the text is presented in two columns (per page), and not that there are only two columns A and B for the whole book.

@registers would work in the same fashion.

martindholmes commented 7 years ago

Does the same text always have the same number of registers on each page, or do you have subsections where (for instance) there are two registers, and others where there are three, in the same text?

duncdrum commented 7 years ago

front / back matter being in single register with all of body in either two or three is common in my experience. I wouldn't want completely rule out more exotic pairings for example for pirate copies, but I have never seen them.

martindholmes commented 7 years ago

Just out of interest, to satisfy my puzzlement: is one register normally commenting on the other, or are they synced in some way, or can they be two entirely separate texts that happen to be sharing the same page-sequence?

duncdrum commented 7 years ago

My understanding is that the convention originated with plays where stage instruction, musical notation and text where set in three registers. But the practice took hold in a variety of commercial prints where they become completely independent text streams. The exact nature of the juxtaposition being the stuff ppl like me like to write about 😉

jamescummings commented 6 years ago

How does this compare to 'parts' (or whatever the term is) used in a full score, where a conductor sees all the corresponding parts. (I'm just wondering if there is a general phenomenon which is then represented in a variety of specific instantiations like east asian books, scores, etc. But maybe they are far too different type of thing.)

It might be good to get the perspective of the East Asian/Japanese SIG?

duncdrum commented 6 years ago

@kiyonoriNagasaki did you see the examples in the first post? One contains a Chinese primary source in two registers. The second link describes Japanese handling of columns in vertical script layouts (where they are more common then in modern Chinese) to show the difference between the two.
There is a print series by 酒井忠夫 based on popular prints from the Tokyo Bunko rare books collection 中国日用類書集成, pretty much all feature this layout, see here unfortunately no digital facsimile links on the bunko page.

@jamescummings transcultural musical notation practices present their own problems. My examples feature books with texts, where no notation, score, or stage direction features are involved.

duncdrum commented 6 years ago

google image works better then library catalogues, doh. 3 columns (continuous flow of the numbering according to reading direction): t9rc005-01

2 registers (not columns, heading on bottom are marked by and continue across fold, stuff on top independent of that ): g0010215_0004

kiyonoriNagasaki commented 6 years ago

I understand what you want. Thank you for putting the images. Distinguishing both would be necessary for some cases.

kiyonoriNagasaki commented 6 years ago

The attributes "columns", "ruledLines", and "writtenLines" for <layout> seem to describe the style of the page, but the "register" seems to require understanding of the text, that is, the divided texts are continuous or parallel. For example, it might be necessary to be distinguished with upper-notes(頭注). Of course, It would not be difficult in many cases. but I would like to ask all of you that it should be included in the attributes of <layout>, that is, should we extend the coverage of the current attributes of <layout>? And then, does Western literature also include similar phenomena? If so, it might be better to add an element to describe stream style of the text rather than extending the meaning of attributes of <layout>.

duncdrum commented 6 years ago

I like the idea of @stream however that raises some interesting questions about what a column is in the context of tei.

@registers

I think that columns are part of TEI's layout vocabulary, because of the history of book production. In that sense both registers and columns work in an analogous fashion. Just as vellum was marked with inked strings, or compositors set the type galley, woodblocks were precarved with a single-, dual-, triple-, layout to be filled-in later in a separate step.

<layout columns="2"/> <!-- ?Plantin's Polyglot Bible? -->
<layout columns="3"/> <!-- Japanese above -->
<layout register="2"/> <!-- Chinese above -->

This would be minimal with respect to the current specs, unambiguous, and in line with terminology used by scholars. Even <layout register="2" columns="2"/> while pure fantasy is quite clear, and makes as much real-world sense as <layout columns="42"/>. The downside, for TEI's global coverage it needs to disambiguate between columns and registers. Consequently, allowing for <rb/> would make sense.

@streams

Streams could more readily cover manuscripts, and are more flexible. However I have not yet encountered the term in any academic discussion of layout. Furthermore, what should we include in its description and why; running heads, marginal notes, footnotes, … ? Is the above mentioned talmudic commentary its own stream only when it sits in a separate block, or also when it appears as interlinear commentary? How should one encode the above examples? Would @stream be mandatory, or can it exist independent of @columns, do we assume latin defaults when it is not specified?

<layout streams="1" columns="2"/> <!-- ?Plantin's Polyglot Bible? -->
<layout streams="6" columns="2"/> <!-- ?Plantin's Polyglot Bible 1 stream per language? -->
<layout streams="1" columns="3"/> <!-- Japanese above -->
<layout streams="2" columns="1"/> <!-- ?Chinese above? -->
<layout streams="5"/> <!-- ??? -->

pantin-polyglot

duncdrum commented 6 years ago

Since no manuscript people have commented yet, I went digging and found this bit from 2003 about "logical flow" in manuscripts:

In the case of capturing the documentary aspects of the page, the encoder may then use anchors, pointers and links to indicate the logical flow of the text. Conversely, encoders capturing the logical flow may use notes to explain where the text appeared originally. In either case, the Guidelines should more thoroughly document these philosophies of encoding, providing examples and alternative methodologies.

I think that <flow> or @stream would be an interesting additions to the guidelines in the long run, but I don't think they should prevent @register in the interim. <layout> seems the natural place to capture however scribes, carvers, or printers divided page surfaces into chunks.

duncdrum commented 6 years ago

Is there a verdict from the council on this? <layout> describes how text is laid out on the page, including information about any ruling, pricking, or other evidence of page-preparation techniques. i still think that registers fall squarely within this definition, but cannot be accurately captured right now.

martindholmes commented 6 years ago

Council meetings start today...

duncdrum commented 6 years ago

Just to be clear calendars and weights are fun, but the thing I really care about at the moment is registers.

duncdrum commented 6 years ago

So any news on registers?

ebeshero commented 6 years ago

Council thinks @streams makes sense and is more flexible (and can imagine lots of good use cases), but do those involved on this ticket agree that it covers all cases you'd associate with @registers? Here is our proposed definition for the optional attribute @streams:
@streams indicates the number of streams per page, each of which contains an independent text stream. (Add a remark: In some fields this might be called registers.)

duncdrum commented 6 years ago

@ebeshero Happy to see the green-light on this, thanks all for the discussion:

I think that @streams can disambiguate vertical columns from registers, like this:

<layout streams="1" columns="3"/> <!-- Japanese (color) above -->
<layout streams="2" columns="2"/> <!-- Chinese (b/w) above -->

datatype should be 1–2 occurrences of teidata.count separated by whitespace since the number of streams often varies within one book.

Since @streams are supposed to cover more ground than @registers, I would change the remark. registers = streams implies that the suggested encoding would be:

<layout columns="3"/> <!-- Japanese (color) above -->
<layout streams="2"/> <!-- Chinese (b/w) above -->

which is still ambiguous in light of this increased flexibility:

Instead, I would suggest adding a remark to @column to clarify that columns are independent of page orientation or reading direction ('standing' as in the bible page above, or 'lying' as in the CJK pages above), and to @streams that they naturally depend on script orientation. One could add that by omitting streams the number is assumed to be 1 and the script orientation of the source to be identical to that of the xml file.

Lastly, following @martindholmes comments, the example in the guidelines should include a suggested encoding of the ending of a streams, eg.:

<layout streams="3" columns="3"/>
…
<div type="page">
一二三<cb type="top-stream"/>
一二三<cb type="mid-stream"/>
一二三<pb type="bottom-stream"/> <!-- @type here just for demo purposes -->
</div>

I'm still not sure how council thinks one should encode Plantijn's polyglot bible page, but I gladly leave that to people working on those type of materials.

jamescummings commented 6 years ago

Hi @duncdrum: adding the streams attribute. Could you modify your example in the previous comment so that it is valid? (text by itself is not allowed as a child of <div>). I thought about adding <ab> elements around these but then was not sure that I'd be misunderstanding.

duncdrum commented 6 years ago

@jamescummings 🎉 Have @columns always had maxOccurs="2"?, can there really never be books alternating between more then two layouts? macOccurs="unbounded" would make more sense imv both for @streams and @columns

<div type="page">
  <ab><pb/>
  一二三<cb type="top-stream"/>
  一二三<cb type="mid-stream"/>
  一二三<cb type="bottom-stream"/> <!-- cb here for demo purposes -->
  </ab>
</div>
jamescummings commented 6 years ago

Hi @duncdrum,

A single layout element gives the details for a single run of layout structure. So if you had a book with 3 columns for 20 pages, then 2 columns for 50 pages, then a single column for 150 pages you'd use 3 layout elements (optionally with a locus element to say where the ranges start and stop) The numbers in column attribute are the number of columns, either a single number '5' to say there are 5 columns, or two numbers '3 10' saying the number of columns ranges from 3 to 10. To be honest I would never use the multiple value method, For every change of column number I would have a new layout element. I'd assume the whitespace separated set of column numbers to give a range is best for those not cataloguing in detail or retrospectively converting from legacy catalogues or databases that may merely contain a string "3-5 cols" or something.

Thanks for clarifying the example, I'll add it in.

duncdrum commented 6 years ago

@jamescummings ah this makes more sense, anything that helps me avoid multiple-value attribute is much appreciated.

jamescummings commented 6 years ago

Since everything looks fine at http://jenkins.tei-c.org/job/TEIP5-dev/lastSuccessfulBuild/artifact/P5/release/doc/tei-p5-doc/en/html/ref-layout.html I'll close this for now.