TEIC / TEI

The Text Encoding Initiative Guidelines
https://www.tei-c.org
Other
273 stars 88 forks source link

retaining punctuation marks in the text of a TEI document #379

Closed TEITechnicalCouncil closed 7 years ago

TEITechnicalCouncil commented 12 years ago

Section 3.2.1 (#COPU-1) of P5 discusses the question of whether punctuation marks should be excluded from or left as part of the text in a TEI document. But no guidance is given on whether and how to record such decisions made. A few requests arising from this:

  1. The <quotation> element (within <editorialDecl>) should be mentioned here in section 3.2.1. (It is already explained in section 2.3.3 (#HD53).

  2. Since the default value of quotation@marks is "all", which indicates "all quotation marks have been retained", it seems that the TEI's default guidance in this matter is to retain them. It would be good to say this in section 3.2.1 as well. The reason, I think, is quite simple: it simplifies rendering of the encoded text for readers if you don't have to reinsert the punctuation. Encouraging consistency aids interoperability.

  3. It would be good to give guidance on whether punctuation marks should be inside or outside containing elements. Here's an example:

<p>She said, <said>“Nobody uses the term <soCalled>‘electronic text’</soCalled> anymore”</said>!</p>

The encoder could have put the quotation marks outside of the said and soCalled elements, and the exclamation point could have been inside or outside of the p element. Encouraging consistency aids interoperability.

  1. There is no element within <editorialDecl> for indicating whether other punctuation marks besides quotation marks were retained. For those creating linguistic corpora using <s>, for example, this is relevant. Perhaps create a <punctuation> for this purpose? I'm not sure how that would relate to <quotation>, though. In any case, if a new element is created, it should also be referenced from section 3.2.1.

Original comment by: @kshawkin

TEITechnicalCouncil commented 8 years ago

This issue was originally assigned to SF user: pfschaffner Current user is: pfschaffner

TEITechnicalCouncil commented 11 years ago

This is actually rather tricky. I am far from certain that I'd recommend putting the quotation marks inside the elements like that. We wouldn't recommend that for punctuation inside the children of <bibl>, for example, though I can't remember whether that's explicitly stated in the Glines anywhere. It suspect the Glines are correctly vague on this topic because there is considerable variation in practice. But you;re right to say we should make it easier to document explicitly what policies have been adopted.

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Original comment by: @lb42

TEITechnicalCouncil commented 11 years ago

Council accepts points 1,2, and 4. But does not want to recommend a specific version of 3. Creation of a new <punctuation> element which is a member of model.encodingDescPart, with prose content similar to other encodingDescPart elements, and some attribute(s) defining the handling of punctuation marks.

Original comment by: @jamescummings

TEITechnicalCouncil commented 11 years ago

Original comment by: @jamescummings

TEITechnicalCouncil commented 11 years ago

In the original ticket I suggested creating <punctuation> within <editorialDecl>, not <encodingDesc>. I suspect that the minutes from the Sept. 2012 Council meeting and the summary above are incorrect and that we intended model.editorialDeclPart.

Original comment by: @kshawkin

TEITechnicalCouncil commented 10 years ago

Original comment by: @rwelzenb

TEITechnicalCouncil commented 10 years ago

Original comment by: @jamescummings

TEITechnicalCouncil commented 10 years ago

Reassigning to PFS to follow up.

Original comment by: @jamescummings

hcayless commented 8 years ago

Trying to resurrect dormant green tickets... @PFSchaffner, @jamescummings: what, if anything is left to be done on this? Is the point to add a reference to documenting editorial decisions about punctuation to section 3.2.1?

PFSchaffner commented 8 years ago

Don't remember, will look.

emylonas commented 7 years ago

It sounds as if council had decided in 2012 to modify the prose in section 3.2.1 so as to mention the <quotation> element, and also to clarify that the default behavior for quotation marks as noted in <editorialDecl>, which is that they have been retained, is the TEI's recommendation. (Kevin's points 1 and 2). Council rejected point 3, and was not hostile to point 4, to consider creating an element <punctuation> vel sim in <editorialDecl> to be used for information about whether punctuation other than quotations is retained or not. If this is done, should also be mentioned in 3.2.1 We should likely go ahead with prose modifications, and discuss how to deal with punctuation. @PFSchaffner do you have anything for us on this?

emylonas commented 7 years ago

OK, so what remains is: original point 1: should mention use of <quotation> in 3.2.1 in addition to 2.3.3. I think this has been addressed as 3.2.1 refers to 3.3.3 where <quotation> is referenced.

original point 2: probably also covered in 3.3.3, although not by suggesting that they be retained. There is a lot of description about what to do and how to encode them.

original point 3 and 4: <punctuation> now has the attributes @marks and @placement

Remaining: document use of the <punctuation> @marks and @placement attributes in 2.3.3 - they aren't mentioned there at all. just the <punctuation> element.

emylonas commented 7 years ago

This is the prose that is in section 2.3.3 now: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD53

<punctuation>

<punctuation> specifies editorial practice adopted with respect to punctuation marks in the original. How has the encoding of punctuation marks present in the original source been treated? For example, has it been normalised, or suppressed in favour of descriptive markup? If it has been retained, is it located within or around elements such as quote which are normally associuated with quotations?

I propose adding the two items below (and fixing the typo in "associated" that appears in the existing prose.

@marks (punctuation marks) indicates whether or not punctuation marks has been retained as content within the text. @placement indicates whether punctation marks have been captured inside or outside of an adjacent element.

The bit on placement is odd, I am copying the elementSpec definition. This is another version:

@placement indicates whether punctation marks have been captured inside or outside the element containing the punctuated text.

Also, the bit about quotations in the last sentence of the overall description is odd, because there is already an element <quotation> in which you indicate how you are handling quotes. perhaps a different "for instance" example?

"if punctuation has been retained, is it located within or outside elements such as a series of <foreign> words or other marked up items"

ebeshero commented 7 years ago

@emylonas I agree about removing the bit about quotes from the description of <punctuation>, and in addition to correcting that glaring typo, I'd like to change "normalised" to "normalized" (Oxford International spelling). And I like your addition of @marks and @placement.

I'm wondering how we can improve the description. Thinking about the larger context in 2.3.3, apparently <quotation> and <hyphenation> are special instances of <punctuation>, and they exist because we so frequently treat these differently from other forms of punctuation. I imagine if I had a series of punctuation marks that I was replacing with elements, including quotation marks, maybe I'd handle describing them all in a general way with the punctuation element. But I might still want to double up and have a quotation element too in my editorialDecl, if I'm handling quotes differently than I am other punctuation.

So, here's what I suggest:

<punctuation> specifies editorial practice adopted with respect to punctuation marks in the original. Are punctuation marks present in the original source retained? Are they normalized? Are they identified with theelement? Are they replaced by markup?

Does that work?

emylonas commented 7 years ago

yes, will update accordingly

emylonas commented 7 years ago

@ebeshero I'm going to nitpick..

Guidelines have: How has the encoding of punctuation marks present in the original source been treated? For example, has it been normalised, or suppressed in favour of descriptive markup? If it has been retained, is it located within or around elements such as <gi>quote</gi> which are normally associated with quotations?

We can add a slightly modified version of the new prose and remove the bit about quotations,

However, there is no attribute for the <punctuation> element that refers to normalization (that probably belongs with <normalization>, below) - so maybe we should remove that bit? The other attribute is @placement, and I was trying to add a reference in the proses to decisions referring to placement (if punctuation has been retained). I find that the sentence is awkward and we may have a better example than a series of <foreign> words.

Possible prose: Are punctuation marks present in the original source retained? Are they identified with the element or implied by markup? If retained, are they placed within or outside the containing element? (still awkward, as the containing element doesn't contain them if they are placed outside....) and we could add ...for example....

ebeshero commented 7 years ago

@emylonas Sorry for the delay! I was wrapping up Digital Mitford meetings--finally coming up for air.

Yes, definitely agree: remove the bit about normalization in our revised prose because it's covered elsewhere. Also, I meant to include an actual element suggestion in the revised prose: <pc>.

So how about this revision? Does this resolve the awkwardness of the "containing element"?

Are punctuation marks present in the original source retained? Are they identified with the element <pc>, or implied by markup? If retained, how are they placed with respect to related elements? For example, do commas and periods appear inside or outside elements marking phrases and sentences?

ebeshero commented 7 years ago

@emylonas How about I post this revision this evening since I have some time free, and we're at the end of the refrigeration period? (I am figuring it's a busy holiday weekend but I'm unusually free for the moment!) Maybe we can kept tinkering with examples on the other side of the release?

ebeshero commented 7 years ago

@emylonas and all: Acting on Elli's and my discussion on this ancient ticket, I've modified and slightly relocated the description of <punctuation> in http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD53 . In this commit, https://github.com/TEIC/TEI/commit/4066ec0550bdd9fafc09ef144195bcbc2bcdb2ec I've

I wanted to get something in before the 3.2.0 release to correct the major issues we identified. I think we should stop here for the release but revisit this ticket on the other side of it.

emylonas commented 7 years ago

This was done and committed. It was left open because we discovered that the attribute definition for @placement was unclear, and would like to fix that. Opening a new ticket.