ebeshero / Amadis-in-Translation

a project to apply TEI markup to investigate early modern Spanish editions of Amadis de Gaula and their translations into English and French from the 1500s to the early nineteenth century.
http://amadis.newtfire.org
GNU Affero General Public License v3.0
4 stars 6 forks source link

XSLT Identity Transformation for Adding xml:ids to Clauses and Other Units of Text #8

Closed ebeshero closed 9 years ago

ebeshero commented 9 years ago

@HelenaSabel has noticed that the numbering of clause xml:ids using my current numbering system is restarting the numbering of clauses within each new paragraph. I am using:

<xsl:number level="multiple"/>

within an attribute constructor to do the numbering. To resolve it, Helena has returned to the earlier method of using count() along the preceding:: axis, with <xsl:value-of select="count(preceding::cl)+1"/>.

History: 1) Recall that we stopped using this construction because it would not work with nested clauses, where a clause might be marked inside a parent clause.

2) However, we decided not to permit nested clauses, because we are effectively using the <cl> element to mark spans of texts as units of translation positioned between punctuation marks, so we can better study how Southey and other translators interacted with the Montalvo text. We decided that as much as possible, <cl> elements need to be siblings, or at least be considered as side-by-side units--not to be nested inside each other.

3) We later discovered that chapters may have multiple paragraphs, as well as floating text layers within the narration (when a character picks up a document or reads an inscription, for example). We now know that <cl> elements will not always be siblings, and may be sitting at different levels of the XML document hierarchy.

The newly resolved use of <floatingText> poses some important questions for how we approach numbering of clauses and other units of text such as lines of poetry, positioned within them. We could try to number everything consecutively, but would it be better if our xml:ids carried a little more of the document hierarchy information?

<xsl:number> has an @level attribute that can be set in a couple of different ways. To make the numbering follow a consecutive order (without problems with nesting a clause inside another clause, or a paragraph inside another paragraph), we can set @level to "any". (This would resolve the problem that Helena encountered with clause numbering starting over within a new paragraph.) I had set @level to "multiple" in my version of the stylesheet to flag nested clauses, as it follows a hierachy: for a child clause within a parent, it would take the parent clause's number, followed by a . and then start numbering the child clauses, as 431.1, 431.2, 431.3, 431.4...etc. Because @setriplette and @HelenaSabel have decided against nesting clauses inside each other, we could set @level to "any" or go back to using the count() along the preceding:: axis. But with the count() strategy, we may still encounter problems with counting according to position of elements within a document hierarchy, and we should try to think ahead, to anticipate what we may need, given what we know of the Montalvo text.

Stacey and I have discussed this, and Stacey would like to see us create xml:ids to record more information about the position of a <cl> (or of an <l> in a line of poetry) within the Montalvo document hierarchy. Thus, as our primary example, we should collect the number of paragraph holding the <cl>, and when that paragraph is sitting inside <floatingText>, we should be able to readily see that in the xml:id. Why would we want that? (Why should it matter?) Because after we have marked how Southey's text correlates with xml:ids in Montalvo, we may later want to instantly be able to tell (without even looking at Montalvo), whether Southey is working with material from an internal floatingText level--or perhaps to isolate all such points in the Southey text which correlate to floatingText in Montalvo based on what we can read in the xml:id. If the xml:id holds some of this information, our coding for text analysis might be a little easier for us.

For this reason, I'd like to continue working with <xsl:number>, which is fine-tuned with its @level attributes to handle numbering within hierarchies. I'm rewriting the Identity Transformation stylesheet to generate a new format for the xml:ids, and we'll want to run this over the group of XML files encoded so far. I'll first rewrite the stylesheet and test it, and then run it over the Montalvo files encoded this far as a group. (Helena, I've also now learned how to do this by creating a Project within oXygen, and will write up some instructions on how that works.)

ebeshero commented 9 years ago

Here is an example of the new numbering of xml:ids, working with xsl:number with @level set to "multiple" in the XSLT stylesheet: This will number paragraphs and catch when they are nested, which occurs when a <floatingText> element appears inside a body <p>.

<cl xml:id="M1_p1_c213">y fizo una carta que dezía.</cl>

                <floatingText resp="#Darioleta">

                    <body>

                        <p>

                    <cl xml:id="M1_p1.1_c1">Este es <persName ref="#Amadis">Amadís Sin Tiempo</persName>
                                hijo de rey:</cl>

                 </p>

                    </body>

                </floatingText>

                <cl xml:id="M1_p1_c214">y sin tiempo dezía
                    ella<!-- st 8.25.15 This part is indirect discourse and not part of the letter or floating text in and of itself. It's pretty normal for Amadís to quote a letter and then reiterate its content in indirect discourse. I don't see it as an overlap. -->
                    porque creía que luego sería muerto:</cl>`

Notice, because we have set @level to "multiple," two things happen:

1) paragraphs nested inside other paragraphs will have a decimal point (or period) in the notation: p1.1 indicates the first paragraph in a floatingText, positioned within the first paragraph of the main narration.

2) clauses are here numbered consecutively, but the numbering starts over again within each new paragraph. (We can change that by setting the @level to "any" just for numbering clauses. This would number every clause in the chapter consecutively, without starting over again within new paragraphs. However, Stacey and I agree that perhaps it makes more sense for our ability to read and understand our xml:ids to number clauses according to their internal position within a designated paragraph.

ebeshero commented 9 years ago

An update! Since working with floatingText units will be important to @setriplette in a future phase of this project, we're going to add xml:ids to floatingText elements. We'll also add an ft number to the xml:id for any clause or line unit of text that is sitting inside a <floatingText> element. We won't then need to use the @level="multiple" numbering for the paragraphs, but just generate these in sequence regardless of position in hierarchy. The presence of "ft" in an xml:id will tell us if the unit in question is inside a floating text or not. More soon: I'm still working on the stylesheet, and I will be making some updates to the Schematron, too.

ebeshero commented 9 years ago

Revised the XSLT ID transform, made some decisions in the use of @level to generate numbering in the xml:ids, and ran a batch transformation to update xml:ids over the coded Montalvo chapters.