TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

A new floating embedded text element #64

Closed TEITechnicalCouncil closed 9 years ago

TEITechnicalCouncil commented 19 years ago

I would like to propose a new TEI element for encoding floating embedded texts and text fragments.

  1. It should have a content model identical or very similar to <text> (users might customize to eliminate the <front>/<body>/<back> layer if they wished to embed only simpler pieces with a more div- like content model).

  2. It should be a floating, not a tessellating element; it should be permitted to appear interspersed between paragraph-level objects.

  3. It should be permitted both within and between paragraphs; it should not be required to be nested within a <p> element (this is one of the more counterintuitive aspects of the P4 <text> element which discourages its use for this sort of thing). However, it should also be permitted within <p> and <quote> and similar elements. Text fragments of this sort appear in both kinds of contexts and need to be accommodated.

  4. It should carry a type= attribute to allow users to categorize the text according to some convenient typology; this feature of <div> is one reason why users often choose <div> instead of <text> for otherwise <text>-like objects.

It could be called <xtext> or <inText> or <embeddedText> or <floatText> or some such thing.

Original comment by: @juliaflanders

TEITechnicalCouncil commented 9 years ago

This issue was originally assigned to SF user: louburnard Current user is: lb42

TEITechnicalCouncil commented 18 years ago

Logged In: YES user_id=1021146

The current <text> element is a member of model.inter, and should thus be allowable both between and within <p>s. There seems to be a long standing bug in the way this is implemented in the present version of P5, which is why you have the impression that it is required to be nested within a <p>: this should not be the case (and was not, for example, in P3).

Assuming we fix this, would simply adding a type attribute to the existing <text> element satisfy your request? If so, please could you suggest some ways in which you might use such an attribute. Is the difference between your proposed new element and the existing one simply that the new one must always be included within an instance of the old one? If so, what is the difference between a "text" in this sense and the "text"s which make up <group>s in the current model? How would you advise someone which element to use?

Original comment by: @lb42

TEITechnicalCouncil commented 18 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 18 years ago

Logged In: YES user_id=1148190

Looking at P3, it appears to me that the list of elements within which <text> is permitted is identical with P4, with the exception of <desc>, <dictscrap>, <entryfree>, <fDescr>, <fsDescr> (permitted contexts in P4 but not P3), and <set> (permitted context in P3 but not P4), none of which seems relevant here. P5 adds <del>, <desc>, <filiation>, <orig>, <restore>, <rhyme>, and omits a few others, but nothing that is to the point here. So my impression of <text>'s permitted locations is apparently accurate as from 1994 through the present. "Allowable between <p>s" should mean "allowable as a child of <div>" but I don't see that <text> has been allowed as a child of <div> at any point, including now.

Adding a type attribute to <text> would help; the most typical use I've seen has been to provide a generic taxonomy along the same lines of type on <div>. The same kinds of values (with some obvious omissions and additions) apply in both cases. Sample values might include "novel | play | poem | letter | invoice | receipt | manifest | transcript" and that sort of thing; the reason I'd prefer not to encode these as keywords in the header is that they'd be harder to link to specific <text>s. If genre can be expressed at the <div> level it seems reasonable to be able to express it similarly at the <text> level, particularly in the case of embedded texts.

The real difference between the proposed element (which for ease of reference I'll call <xtext> here) and the existing <text> element is that <text> is usually understood as an independent unit--something that was published separately or in some other way has ontological separateness as a work. The idea behind <xtext> is that the embedded textual objects don't necessarily possess that separateness, and in any case they're not being represented as independent items. They may be quoted material, they may be just plopped into the narrative (for instance, a letter represented within a novel but not read aloud within the narrative), they may be presented as a form of exhibit (for instance, in 19th-c crime drama, where various kinds of documents are interspersed with narrative commentary as a kind of glue; or in Renaissance pamphlets ditto). The current word for such things is "transclusions" which is dreadfully trendy but does describe the phenomenon I mean.

So on the side of <text> we have: --separately published items (e.g. a novel, a poem, a play) --sets of items which are explicitly grouped together for publication (e.g. a collection of novels, plays, etc.)

and on the side of <xtext> we have: --entire textual objects which are quoted within a larger textual context (e.g. a one-act play or a short essay quoted within a novel or letter) --real or fictional textual objects which are represented within a larger textual context (e.g. a letter within a novel, a piece of documentary evidence from a witness in a trial proceeding)

A few clarifications in answer to your specific questions:

The fact that <xtext> may only occur inside <text> does not cause it to resemble the <text> within <group> at all; the <text>s that make up <group> are tessellating elements (within <group>, only <text> is permitted). The idea behind <xtext> is that it is explicitly an embedded element: an inclusion like a lump of stone within a conglomerate or a piece of fruit in a pudding. It is not part of a group, but rather something that appears within some larger stream (either by quotation or by inclusion of another sort, in the manner of a document submitted as evidence in a trial). It is not being considered as a published item, although it may be one; it is not a "primary" document within the textual scheme. It is situated and dependent on its textual context.

If both elements were provided, I would advise people to use <text> to represent the boundaries of the primary documents they are dealing with, whether those are individual items or groups of items, and to use <xtext> for all cases where a text-like object (i.e. something with a text-like nature and/ or internal structure) is embedded within <text>. It would be useful to discuss the difference between embeddedness (linked with the idea of "floating") and tesselation; <xtext> is not just another way of doing what <div> does, which is to represent subdivisions of a document--i.e. the logical boundaries of the document's sectioning. It is rather a way of bringing in things from outside and representing their boundaries and internal structure. For instance, in a collected letters volume, I think either <text> or <div> would be appropriate for the individual letters, but not <xtext>. The individual letters are either independent entities or they are subdivisions of the whole, but they are not embedded.

Original comment by: @juliaflanders

TEITechnicalCouncil commented 18 years ago

Original comment by: @juliaflanders

TEITechnicalCouncil commented 18 years ago

Logged In: YES user_id=1021146

Thank you for the long and helpful comments and apologies for the delay in responding. I am going to suggest that this issue be discussed by the Council, since I am still unable to agree with your conclusions, although I entirely agree with the distinctions you're drawing. If the main difference between <text> and <xtext> is that the latter is "plopped in" like a quotation, why not use <quote> for it?

Original comment by: @lb42

TEITechnicalCouncil commented 18 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 18 years ago

Logged In: YES user_id=1148190

Concerning why one would not simply use <quote>:

First, these texts are not necessarily quotations, although they might be.

Second, in cases where they are quotations, the problem is that <quote> by itself does not have sufficient internal structure--one would need to use <quote> with <text> inside it. But I believe it would be useful to distinguish between the textual structures which are native to the document's own organization, and those which come into it from outside, or erupt inside it for narrative reasons. Using <text> for the former, and <xtext> for the latter, would make it possible to distinguish between the material contained within <text> elements that are "native" in this sense (even those which may be descendants of the outermost <text> element, such as those enclosed within <group>), and material contained within these "transcluded" texts, which has a very different ontological status from that of the surrounding content.

Original comment by: @juliaflanders

TEITechnicalCouncil commented 18 years ago

Original comment by: @juliaflanders

TEITechnicalCouncil commented 17 years ago

Logged In: YES user_id=1021146 Originator: NO

Following the recent discussion on TEI-L, I am now more and more inclined to address this requirement by changing the content model of <quote> to be -- more or less -- ANY.

Original comment by: @lb42

TEITechnicalCouncil commented 17 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 17 years ago

Logged In: YES user_id=1148190 Originator: YES

It is not clear to me how changing the content model of <quote> to be inclusive and permissive would help. <quote> already may contain <text>. Allowing <quote> to contain the contents of <text> would eliminate a layer, but if the text object in question is not a quotation, this solution won't help in any case.

Original comment by: @juliaflanders

TEITechnicalCouncil commented 17 years ago

Logged In: YES user_id=222320 Originator: NO

In light of the discussion, I am all in favor of creating a new element xtext. This will mess up every conceivable content model, I assume, so the important issue is: Do we do this now or is this something we could postpone without too much collateral damage to 1.1? I am afraid the answer has to be now:-( Christian

Original comment by: @cwittern

TEITechnicalCouncil commented 17 years ago

Logged In: YES user_id=1021146 Originator: NO

I do not agree. All the use cases so far presented seem to me to indicate that there is a need for some kind of "floating" text container, and I don't disagree with that. But I believe that we already have two elements fit for this purpose in <text> (where the floating object is actually a component of the current text but doesn't respect the usual hierarchic structure) and <quote> (where the floating object is in a sense not a component of the current text, being quoted from some other source). If there are limitations in the content models of the existing elements, by all means let's fix them; I've started to explore that possibility with <quote> below. But to add a third element into this already rather crowded corner of the Guidelines really needs some substantial examples where the existing two are not good enough. Which is not of course to say that such cases don't exist. I just haven't seen any yet and don't have the leisure to speculate.

Another way of looking at this argument is that really the desire here is to restrict the meaning of <text> to "the outermost element of a hierarchically structured document" and invent <floatingText> for the other cases. But then would the <text>s which make up a <group> become <floatingText>s ? I hope not.

Original comment by: @lb42

TEITechnicalCouncil commented 17 years ago

Logged In: YES user_id=1021146 Originator: NO

There hasn't been much discussion of this issue since I asked for comments on TEI-L. However, I agree that it is probably a good idea to distinguish the use of <text> as a container, either directly within <TEI> or within <group>, on the one hand, from its use a free standing embedded object on the other, and am also reconciled to the fact that some at least of the latter uses are not best served by <quote>. So I am closing this ticket and creating a new TRAC item to produce and circulate a document defining a new <floatingText> element asap, with a view to getting it into P5 if there is general agreement.

Original comment by: @lb42

TEITechnicalCouncil commented 17 years ago

Original comment by: @lb42