TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
276 stars 88 forks source link

<subst> should permit textual data #201

Closed TEITechnicalCouncil closed 9 years ago

TEITechnicalCouncil commented 15 years ago

As presently defined, <subst> can contain only <add> and <del>. Some feel that it should also contain text.

Original comment by: @lb42

TEITechnicalCouncil commented 9 years ago

This issue was originally assigned to SF user: gabrielbodard Current user is: gabrielbodard

TEITechnicalCouncil commented 15 years ago

Is there any argumentation for this or examples of how it would be used? At first blush it seems counter-intuitive to me.

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 14 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

The request is meant to deal with non 1:1 substitutions like in the following example (Thomas Morre's Lalla Rook):

<l>While <subst><del>pondering</del> thus <add>she mus'd</add></subst>, her pinions fann'd</l>

Where the "she mus'd" substitute the "pondering" by they are separated by "thus" (this an example given to me by Justin Tonra, the editor of the poem, the fact that the "she mus'd" is a substitution ofr "pondering" is his own view).

Another case: in Italian an adjective can be located before or after the noun. For no particular reason, some of them 'sound' better before, other after, so you can have

- un buon padre

suppose I want substitute in my original sentence 'buon' with 'onesto', I would encode:

<p> Un <subst><del>buon</del> padre <add>onesto</add></subst> </p>

or at least this I would like to do as this is semantically a substitution as much as if I had:

<p> Un <subst><del>buon</del><add>onesto</add></subst> padre </p>

I guess I could encode as well:

<p> Un <del xml:id="del1">buon</del> padre <add corresp="#del1">onesto</add> </p>

but it seams to me that it would be wrong to encode the same phenomenon in two different ways (within or outside <subst>) only because ther is a word in between: this fact does not change the nature of the correction made by the author, he/she mean to change buono with onesto in both cases.

Original comment by: sf_user_epierazzo

TEITechnicalCouncil commented 14 years ago

Elena's examples seem to me entirely persuasive. It is a pure accident of syntax that substitution of (in this case) one adjective for another sometimes alters the position of the adjective and sometimes does not. The two are functionally identical and should be capable of encoding in the same way.

Nihil obstat, <subst><del>says</del> I <add>say</add></subst>.

Original comment by: @DavidSewell

TEITechnicalCouncil commented 14 years ago

The use case is convincing, yes. But doesn't this imply that subst should be able to hold practically any content, as the intervening text itself may need markup (abbr, seg, choice, note, g, c, what have you)?

Original comment by: @pboot

TEITechnicalCouncil commented 14 years ago

Following discussion during the TEI council meeting 29/4/2010, I find the arguments in support of this change convincing - although as noted by pboot below it would be required that subst be changed to allow for other content, as the text not included in add and del may itself need markup.

Original comment by: @leoba

TEITechnicalCouncil commented 14 years ago

To play devil’s advocate, I’m going to posit that Elena’s example is an argument for permitting either <add> or <del> as a child of <subst>, without requiring both. This would allow Un <subst xml:id="s1" next="#s2"><del>buon</del></subst> padre <subst xml:id="s2" prev="#s1"><add>onesto</add></subst> rather than requiring something like Un <del xml:id="sp1">buon</del> padre <add xml:id="sp2">onesto</add> <join targets="#sp1 #sp2" result="subst" scope="root"/>

Original comment by: @sydb

TEITechnicalCouncil commented 14 years ago

Syd, the example we discussed in the meeting is in the Genetic work group's document:

http://www.tei-c.org/Activities/Council/Working/tcw19.html\#index.xml-body.1\_div.3\_div.2\_div.6

The proposal is indeed to have both <add> and <del> in <subst>, but to be able to include text that may be between text that is marked as deleted, or added. You can see the full example (with image) in the link above, but the example encoding looks like this:

<ge:line>While <subst> <del>pondering</del> thus <add>she mus'd</add> </subst>, her pinions fann'd</ge:line>

Original comment by: @leoba

TEITechnicalCouncil commented 14 years ago

More comment on that: In this case, "pondering" has been crossed out, and "she mus'd" has been added above the line, but the editor has determined that the second set of words replaces the word that has been crossed out. It's incidental that "thus" is between them. "Thus" itself is neither deleted nor added, but it is part of the substitution phrase.

There is some disagreement as to whether allowing this kind of text in <subst> will encourage editors to be "too liberal" in their interpretations, but I think that the flexibility enabled by this change is worth it (and I'm not convinced that this would be a problem at all).

Original comment by: @leoba

TEITechnicalCouncil commented 14 years ago

Right. My point is that TEI already supports one method of encoding this that makes it perfectly clear that “onesto” was substituted for “buon” and that “padre” was not part of this editorial intervention (i.e., was not crossed out or supralinear or whatever):

Un <del xml:id="sp1">buon</del> padre <add xml:id="sp2">onesto</add> … <join targets="#sp1 #sp2" result="subst" scope="root"/>

Original comment by: @sydb

TEITechnicalCouncil commented 14 years ago

As currently defined, <subst> requires that there be a <del> and an <add>. This is in line with its intended meaning -- to group an addition and a deletion and assert that they are part of the same (specific kind of) intervention. In the general case, this can only be done by means of some kind of standoff solution, as Syd suggests. If however we allow <subst> to have mixed content, it will resemble other editorial elements e.g. the generic <mod> and the requirement that an add and a del both be present will be unenforceable. The compromise proposed defines a special case content model, which seems of limited use. Maybe what we need is another element <stet> meaning that this text is part of a substitution which is not changed!

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

Proposal is to permit <subst> to contain macro.Xtext, <add>, and <del> and to add a schematron rule to enforce the requirement that there must be exactly one add and one del. Document this constraint and also warn against misuse of this feature which is only for cases where the addition and deletion are clearly part of the same intervention.

Original comment by: @lb42

TEITechnicalCouncil commented 14 years ago

I agree with the macro.xtext (no capitalization) addition.

I'm not so sure of the schematron rule. I agree that it shouldn't be misused but what about cases where two deletions in one word and an addition are all one editorial intervention?

I have no good example use cases, but I'm sure they must exist. Let's say a word was comically written 'bloodybalderfuckingdash' but a later editor crossed these out and changed this to balderdashwood. (In presumably some misguided nod to Austen) would that not best be encoded:

<subst><del>bloody</del>balder<del>fucking</del>dash<add>wood</add></subst>

rather than a two separate <del>s and an <add>?

Otherwise I agree with the macro.xtext.

-James

Original comment by: @jamescummings

TEITechnicalCouncil commented 14 years ago

I agree with James's last comment that constraining <subst> to a single add and a single del is unnecesarily strict. (An example: to correct handwritten "Je ne suis pas convaincu" to "Je suis incertain" would require three (or at least 2) deletions and a single addition, because of the divided nature of the French negative.)

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 14 years ago

Would the example in the guidelines--the one from William James--violate the "one del, one add" rule anyway?:

One must have lived longer with <subst> <del seq="1">this</del> <del seq="2"> <add seq="1">such a</add> </del> <add seq="2">a</add> </subst> system, to appreciate its advantages.

Looks to me like it would, but I admit that part of the discussion here is beyond me.

Also, I'm not sure where to register these two other beefs, but since they came up as I was pondering this:

  1. In 11.3.5 SubstitutionsTEI: Substitutions, the text that introduces the code above is in error. Namely, in the phrase "the word this is first replaced by such a and this is then" the second "this" should not be italicized, as it refers not to the word this but is a pronoun for "such a." Right?

  2. On my project, we fairly frequently encounter pairs of words or phrases that we want corral together as alternatives to one another. In the most usual case, the first alternative is deleted and the second added, but quite often the first is not deleted. Because <subst> doesn't allow text, we have taken to using <orig> for the first reading. Lou's comment of 2010-05-01 made me realize, though, that this runs afoul of the stated intent for <subst>. None of the discussion or proposals to date seem to me to address the case I'm talking about, so perhaps it's just a red herring for this particular ticket?

Original comment by: sf_user_brettbarney

TEITechnicalCouncil commented 14 years ago

At present the content model of <subst> is (model.pPart.transcriptional), (model.pPart.transcriptional)+

and the following are the members of model.pPart.transcriptional:

add app corr damage del orig reg restore sic supplied surplus unclear

Clearly, this content model permits some things that can't reasonably be interpreted as part of a substitution, e.g. <app> or <corr>, and also some things that might plausibly appear within a substitution without forming part of either its "before" or its "after" such as <unclear> or <surplus>. You might want to say for example <subst><add>foo</add><unclear/><del>bar</del></subst> where the <unclear/> behave just like the bit of text which this ticket proposes should also be allowed. I cannot imagine however what it might mean to say <subst><add>foo</add><orig>fooo</orig><del>bar</del> </subst> or many of the other possibilities offered by this content model.

I would like to be able to say that the content model should be (as proposed earlier)

(text | model.gLike | add | del)*

with an additional constraint that says there must be at least one <add> and one <del> . This would have the effect of forcing the encoder to put any of the other transcriptional markup inside one or other of the <add> and <del> elements.

However this is quite a major change from the current model, so I would appreciate any second thoughts from council members before implementing it.

Another thought: should this also allow <addSpan> and <delSpan> ?

Original comment by: @lb42

TEITechnicalCouncil commented 13 years ago

Is this ticket still under active discussion? We have a use case need for <seg> within <subst> in a printed book transcription project, for what it's worth.

Original comment by: @DavidSewell

TEITechnicalCouncil commented 13 years ago

This ticket was discussed at the Council meeting of 2011-04-11 without however any consensus emerging. A subgroup has been requested to review the issue.

Original comment by: @lb42

TEITechnicalCouncil commented 13 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 13 years ago

After discussion between the subgroup as advised by Council (cf. tickets http://purl.org/TEI/FR/3393244 and http://purl.org/TEI/FR/3080015 ), the subgroup recommend:

The creation of a new element tei:substJoin, which will take attribute @target, and will be recommended for use pointing to two or more add and del elements in the case where these are separated by text or other elements, but clearly represent a single substitution or scribal correction. (This element may be considered semantic sugar for tei:join[@result='subst'], but its use-case is sufficiently unlike most cases where tei:join might be used that we felt it was helpful to create this new element and clear guidance as to its use.)

Simple cases of substition involving only one add and one del should continue to be tagged by nesting in a tei:subst, as currently, but the need for including text or other elements in subst is now removed.

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 13 years ago

I can understand the rationale for the proposed simplification of <subst> and addition of <substJoin/>. But it does have one (to my mind) highly undesirable side effect, of forcing semantically identical changes to be encoded differently. (I'm really only going to restate something noted in earlier comments by Elena and others.)

Consider two authorial transformations in French:

(1) un château ancien ==> un château croulant ["crumbling"]

(2) un château ancien ==> un vieux château

In each case, there is a deletion of the adjective "ancien" and the substitution of another adjective. Deep structure is identical, but in the case of "vieux" the surface-structure rules of French require a change in position of the adjective.

Under the proposal, these two substitutions must be encoded differently:

un château <subst><del>ancien</del> <add>croulant</add><subst>

un <substJoin target="a1 d1"/><add xml:id="a1">vieux</add> château <del xml:id="d1">croulant</del>

There are serious disadvantages to this: the encoding is more complicated, you can't take advantage of having a single XML element (<subst>) as a basis for rendering/indexing/processing, you can't handle or group the two kinds of cases together without a lot of extra back-end programming, etc.

Granted the problem with allowing non-whitespace text as a child of <subst>, would it cause harm to allow <seg> as a child so that this would at least be possible?

un <subst><add>vieux</add> <seg>château</seg> <del>ancien</del></subst>

Original comment by: @DavidSewell

TEITechnicalCouncil commented 12 years ago

Now replaced by 3393244

Original comment by: @lb42

TEITechnicalCouncil commented 12 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 12 years ago

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 12 years ago

Reopened. Implementation of <substJoin/> is still needed.

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 12 years ago

Lou argues that <substJoin/> is mis-named. Other suggestions?

Original comment by: @jamescummings

TEITechnicalCouncil commented 12 years ago

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 12 years ago

Council eventually decided that a) we should change content model of subst tpo permit only add, del, and milestoneLike elements b) we should propose new standoff element to combine interventions functioning as a group (substJoin, or LB proposes combine)

Original comment by: @lb42

TEITechnicalCouncil commented 12 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 12 years ago

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 12 years ago

All done. I've also removed the <remark> on the count of adds and dels that Council agreed was now redundant.

Original comment by: @gabrielbodard

TEITechnicalCouncil commented 12 years ago

Original comment by: @gabrielbodard