ebeshero / Amadis-in-Translation

a project to apply TEI markup to investigate early modern Spanish editions of Amadis de Gaula and their translations into English and French from the 1500s to the early nineteenth century.
http://amadis.newtfire.org
GNU Affero General Public License v3.0
4 stars 6 forks source link

Increased accuracy on the translations and alternative TOC #49

Closed HelenaSabel closed 8 years ago

HelenaSabel commented 8 years ago

In these commits:

ebeshero commented 8 years ago

@setriplette @HelenaSabel I've just merged the pull request, but Stacey you'll want to look at this as a new level of coding related to our "stitchery" work and the comparison files. I'll need to go post these on the website, too.

For now, one neat trick with GitHub that I've just learned is that you can see an HTML file we post here in a GitHub preview mode by putting this string ahead of our file URL on GitHub: http://htmlpreview.github.io/?

So, to view Helena's alterations to Chapter 1 in the web browser, see: http://htmlpreview.github.io/?https://github.com/ebeshero/Amadis-in-Translation/blob/master/html/Chapter1.html by contrast with our original view posted on the site: http://amadis.newtfire.org/Chapter1.html

HelenaSabel commented 8 years ago

That's a very cool trick I didn't know about. Thank you @ebeshero!

There is a rule in our Schematron now that forbids the use of two elements with the same ID. So whenever the need arises to “repeat” the identification of a clause, we'll be suggested to go to Montalvo's file and divide the clause so as to create as many segmentations as needed. For example, in Montalvo's first chapter I only had to segment nine of the 225 clauses (so it's not that common).

HelenaSabel commented 8 years ago

So do you like the new TOC? http://htmlpreview.github.io/?https://github.com/ebeshero/Amadis-in-Translation/blob/master/html/toc-2.html

ebeshero commented 8 years ago

@HelenaSabel Hmmm. About the Schematron: Let me make sure I understand this clearly from the top, so It's clear to @setriplette too.

ebeshero commented 8 years ago

@HelenaSabel Now, as for the table of contents, I agree it makes sense to move the word count comparisons elsewhere, probably to the chapter tables(?) The Venn strip is cool, but I want to make sure it's really proportional, and I still think I'd be more comfortable with two separate rectangles, one representing the total of Montalvo and how much of Montalvo actually appears in Southey, and one based on the total of Southey and how much of his text isn't in Montalvo. The totality of the block could be turned into percentage values, or scaled by a factor that resolves their different word-count proportions, but I worry that we're really working with two different sets of totals here but we're projecting a visual illusion of one single total. And it looks appealing, but isn't very easy to explain what we're representing or why we present it as if it were a singular whole.

If I tried generating a single block like that, I think I'd want to base its total on the word count of Southey + the word count of Montalvo. And then the block in the middle would quite literally represent the percentage of words that appear to correspond across both texts...next to the percent that appear to be distinct in only one or the other. I think that's not how we're currently merging those bars, is it?

Small stuff: Can we try putting the bars to the right of the text identifying the chapters? So, chapter list on the left with corresponding bars on the right?

What if we removed the solid black border around the bars?

Thanks for working on this and all the good thinking you've been applying to it!

HelenaSabel commented 8 years ago

Hello! I understand your misgivings about the overlapping bars. They are two individual bars (one calculated with Montalvo's data and the other one with Southey's) that overlapped. Even when giving a fixed width to the Montalvo one, Southey's length change because I do keep the proportions. I positioned Southey's bar by calculating the portion of Montalvo's text not present in his translation. However, I'm not using word counts for these, but the number of matching clauses instead. I thought the data would be more “neutral”, more related to the translation than to the particularities of each language. Was I wrong on my assumption? If we were to calculate the length of the bars using the word counts (instead of the clauses/anchor counts), Southey's bar would be much more smaller than Montalvo and his additions would be insignificant (because, so far, those are usually one/two words long). It would give you a better idea of Southey's shrinking process, but we might loose the perspective of how much of Montalvo's content is present somehow in Southey and how much is completed omitted (and I think the way to grasp that information is by using the clauses/anchors as the base for the calculations). How about I make more graphs (using word counts, clause counts, information of the percentages and so on) and by seeing them together we are able to decide which type suits us better?

ebeshero commented 8 years ago

@HelenaSabel Okay--that is a helpful explanation, and I see how working from word counts will likely look a lot different. When you ask if you were wrong in your attempt to present the changes in a more "neutral" way, the need to use those quotation marks is a pretty good signal of what's worrying me. We are finding that we need to represent data in the same graph based on two different proportional scales. I incorrectly presumed just now that this was based on word count, but you remind me it was a count of clauses. Okay. But we still have a problem of one portion of your graph being based on a total count of clauses in Montalvo, and another based on the total count of clauses in Southey, right?

1) Can the problem be resolved, I wonder, by calculating all three bars (omissions, matches, deletions) based simply on the total number of clauses in Montalvo (TM) plus the total number of clauses in Southey (TS)? Then each bar would reflect a portion of a larger combined total, and I think we might be able to resolve the misproportion issue in the current plot.

2) Alternatively, we could simply plot the Montalvo-based calculation in a separate bar from the Southey-based calculation.

3) We could probably plot both 1) and 2) and that might be worth reviewing together!

I do, definitely, want us to go back and think carefully about explaining exactly the basis of our plots, and that explanation really needs to be present on the HTML we post to our website. I likely just forgot something I knew two months ago about those plots, that they are based on numbers of matching clauses and not word counts, but since this is complicated and we are all busy, and we need to be able to communicate clearly with each other and others interested in our project, we really need some clear explanation of the basis of our calculations to be posted for all to see. I am happy to help with that.