Tables for analysis: Recast to TEI Feature Structure Encoding?

HelenaSabel commented 9 years ago

Hello, Stacey!

I've created some HTML tables so you can begin your analysis. I hope they are helpful, and I promise I'll find the time to improve them as soon as possible. Right now, you can find the tables for chapter 1 and 2. You can open the files in a web browser and then do any changes opening them in Oxygen. I hope you get use to edit this kind of table, but if you find it annoying, we can try to do some spreedsheets instead. <tr> stands for table row, and each cell in it is wrapped inside a <td> element. The first table in each file is the Southey reading compared to Montalvo's source. There you can find every addition and modification done by Southey. However, I wasn't able to figure out how to get Southey's omissions in a ordered way, so you can find those in a second table, where besides the omissions there is some previous and following context so you could understand those better. Please let me know how I can make your analysis easier and I'll keep working on doing only one table for chapter that would include all the modifications.

setriplette commented 9 years ago

This is great Helena! I’ll take a look at in my next work session. I’m looking forward to seeing the tables, and I’ll let you know about any issues that come up. I’ll start putting in a “direct” whenever I see that situation. Just the once so far, but I expect in a text this long to see it again.

S

Stacey Triplette Assistant Professor of Spanish and French Humanities Division University of Pittsburgh at Greensburg Faculty Office Building 200 150 Finoli Drive Greensburg, PA 15601

On Oct 2, 2015, at 6:04 PM, Helena notifications@github.com<mailto:notifications@github.com> wrote:

Assigned #23https://github.com/ebeshero/Amadis-in-Translation/issues/23 to @setriplettehttps://github.com/setriplette.

— Reply to this email directly or view it on GitHubhttps://github.com/ebeshero/Amadis-in-Translation/issues/23#event-425414669.

HelenaSabel commented 9 years ago

I enhanced a little bit my code and now I got only one table per chapter that contains all the modifications. So, the source text is still Southey (you go through the second column of the table and you are supposed to be able to read Southey's full chapter). Additions are colored in green, reports in yellow, direct speech in orange and omissions in red. In the omissions, you get the clause that matches in both versions previous to the actual omission (I wasn't able to create a new row just for the omission, so this was my patch around it): so in Montalvo's column you'll have the text not used by Southey and in the Southey column the word “omission” (it makes more sense when you see it). Hope this is useful!

HelenaSabel commented 9 years ago

Hello, Stacey! The tables for Southey chapters 1, 2, and 21 are definitely ready. Keep track of all the changes you would like in them so we can make your analysis easier!

HelenaSabel commented 9 years ago

Dear @setriplette, I added a new chart with the word counts (they are sortable) . I could add those columns to the existing tables, if you prefer. If there is any other information you'd like to have in the chart(s), let me know! The numbers in Southey's column that are red or green mean that there is a variation of 75% or higher in relation with Montalvo's number (is that difference OK or would you like to visualize a different percentage?) I didn't bother with enhancing the style, since it is much more practical if @ebeshero takes care of it when building the site.

ebeshero commented 9 years ago

@setriplette @HelenaSabel We can be planning and designing graphs from the tables. Think about where a visual summary in done kind of graph form might be useful. @HelenaSabel I am happy to have you start some graphs while I work on the side-by-side view. And actually, I have more Southey to code. I've got to concentrate on the Mitford presentation, which is lagging behind Amadis--I need to write XQuery to SVG for that, and I think I had better get rolling on that.

ebeshero commented 9 years ago

@HelenaSabel @setriplette One of my reservations about these tables is that they are data stored in a kind of bloated HTML format--organized by tr and td elements that are for presentation really and not optimized for semantic distinctions, even though we can "see" those distinctions in column rows in a web browser. Last night, my student Becca Parker and I made a decision against formatting table data in a table with cells because table formatting was just getting in the way of the real data binding we wanted to do in XML first and foremost. The HTML table we can always output later, but if we want to optimize human readable code for data analysis, we found we prefer the Feature-Structure encoding in the TEI, which took us from this old 19th century table to this simply organized TEI code using the feature structure elements and attributes to hold related information concisely together.

Okay, so what this amounts to is a Request to Helena: Would you have a look at Feature Structure markup and examples in the TEI Guidelines Ch. 18 , and try outputting these tables in nice tidy < fs> elements instead of these HTML table rows and columns? I know that might seem like taking a step backward, but keep in mind that this information we are gathering is for long-range data analysis and not only for display. Also if we store and add new information in Feature Structure format, we have many options for displaying information from such a page: HTML tables and lists as well as SVG graphs. And it will be so much easier and less error-prone (or "brittle") for Stacey to add new data into a simple, clearly labeled XML hierarchy than in HTML table form. Can we try this? It would change your XSLT so it does XML to XML pull processing, and you'll need the appropriate TEI namespace defined twice in the stylesheet template (for the XPath read and the processing).

ebeshero commented 9 years ago

@HelenaSabel We'd need to work out a smart, simple-as-possible system for "packing" our table data in this form, so we need to make some decisions of how to use the element <f> and its attributes. The use of the <string> element would be pretty clear, and we could use it as a space for Stacey's input as well as for the text we're extracting from Montalvo and Southey. My hope is that the code be easy to read and logical to navigate through the outline view and some simple XPath.

HelenaSabel commented 9 years ago

@ebeshero I love the idea of making more semantically meaningful tables. Since you are more experienced with this use of the module (I make a completely different use of Feature Structures), do you have any suggestions so as to code the information we've decided to include?

HelenaSabel commented 9 years ago

For example, would it make sense that we make a <fs> for each of Southey's anchors. Each one of them could be formed by the following <f>: `<fs xml:id="{anchor ID}>

Southey's text Montalvo's text

`

ebeshero commented 9 years ago

@HelenaSabel Yes! That's exactly what I was thinking. To use the Feature Structure markup, we pretty much have to work out a system of attribute values on @name, just to indicate what information is held in each f element. And doing that just makes it easier to see where to add new information (I think Stacey would get her own f element to hold a string of her comments, right?)

It was a relief--really!--to develop a Feature Structure system for the tables Becca Parker is dealing with in her (very different) project on Friday, and the experience showed me how adaptable that markup is to just about any kind of data-binding we want to do!

ebeshero commented 9 years ago

And of course, outputting HTML from that will be "easy as pie": In fact, we're making it an XSLT homework assignment in the Pitt-Greensburg course to replace the Skyrim XSLT table assignment!

ebeshero commented 9 years ago

@HelenaSabel I am just sorry we didn't figure this out together while you were here! Though the XSLT challenge would have really been the same--just different output!

HelenaSabel commented 9 years ago

@ebeshero I've made a transformation using Southey as the base text. This is a sample of the results, because I had some doubts that I wanted to comment with you and @setriplette. This is how a normal structure looks like: <fs corresp="#M0_p1_c1"> <f name="southey" n="9" ana="0.44"> <string>Not many years after the passion of our Redeemer</string> </f> <f name="montalvo" n="13"> <string>No muchos años después de la passión de nuestro redentor y salvador Jesuchristo:</string> </f> <f name="type" select="indefinite"> <string>Comments</string> </f> </fs> @corresp is the ID of the <anchor>. In that first sentence, there was no @type in the <anchor> and that's why it has the “indefinite“ value. The @ana attribute is the relation between Southey's and Montalvo's word number (the more distant from 0 the value is, the more difference there is between them). When there is an addition, it looks like: <fs> <f name="southey" n="5"> <string>which sorted to such effect,</string> </f> <f name="type" select="add"> <string>Comments</string> </f> </fs> Is it ok if there is no @corresp attribute in the addition? Regarding omissions, what I did was to consider them a gap (kind of), so we have a <fs> with a match that contains Southey's and Montalvo's text and then a <fs> only with Montalvo's text until the following "match". Is that OK? Even if Southey is the base text, in this manner we could contextualize the omissions better (or that was my intention), <fs corresp="#M0_p1_c124"> <f name="southey" n="5" ana="0.4"> <string>you shall be well recompensed</string> </f> <f name="montalvo" n="7"> <string>de mí seríades muy bien galardonada.</string> </f> <f name="type" select="indefinite"> <string>Comments</string> </f> </fs> <fs corresp="M0_p1_c125"> <f name="montalvo" n="29"> <string>Cierto señor dixo ella por muy contenta me ternía en hazer servicio a tan alto hombre y tan buen cavallero como vos sois si supiesse en qué. </string> </f> <f name="type" select="omission"> <string>Comments</string> </f> </fs> Anything I should change?

HelenaSabel commented 9 years ago

And when there are multiple clauses in Montalvo without a reference in Southey, the @corresp attribute looks like this: `

yo os lo diré. Dezidlo sin recelo dixo ella: que enteramente por mí guardado os será. Pues amiga señora dixo él: dígovos que en fuerte hora yo miré la gran hermosura de Elisena vuestra señora: Comments ` I'm going to push what I did and please do not hesitate to suggest any modifications.

ebeshero commented 9 years ago

@HelenaSabel Wow! That was quick work--thank you! Hmmm. Should we have an @corresp in the addition output or is the fs for addition already embedded in something corresponding to a Montalvo passage?

Other than that one question, I think this looks really logical, useful, and easy to read. I like your use of @ana to hold differences in word count!

HelenaSabel commented 9 years ago

The only anchors that have no correspondence in Montalvo are the additions. Should we create an ID for the chart purpose by getting the reference of the previous match and adding something to it?

ebeshero commented 9 years ago

@HelenaSabel probably not necessary--but did we output the nearest closest match in Montalvo in the HTML version? If so, and it was working, your code should be able to reach back and grab the nearest ref pt in Momtalvo. Don't worry about it if it's a pain to grab.

HelenaSabel commented 9 years ago

It shouldn't be much of a pain, but considering that you only have to see the previous <fs> to know the last point of reference, I don't know if inventing IDs is actually meaningful (because there isn't really a @corresp in there, is it? and since the information is already ordered, you get the context by reading the previous and following <fs>). Anyway, I'll try to find the time today to do some graphs (so any suggestion about what you'd like to see would be greatly appreciated) and if you decide that every <fs> should have its ID, I'll add it later.

ebeshero commented 9 years ago

@HelenaSabel Ahh--that was what I thought. No--there is really no need for the id on Additions then. I don't know why, but I was imagining these being output out of sequence (and of course they aren't out of sequence).

ebeshero / Amadis-in-Translation

Tables for analysis: Recast to TEI Feature Structure Encoding? #23