FrankensteinVariorum / fv-collation

first-stage collation processing in the Frankenstein Variorum Project. For post processing and Variorum development, see our GitHub organization: https://github.com/FrankensteinVariorum
https://frankensteinvariorum.github.io/fv-collation/
GNU Affero General Public License v3.0
9 stars 2 forks source link

Workflow: Correcting Space Issues in Variorum View #75

Open wdjacca opened 3 years ago

wdjacca commented 3 years ago

We're making repairs to the files already on display in the Variorum. We are doing this for two reasons: 1) To fix two-year-old problems in the display 2) To diagnose why those problems occurred.

Start looking for things that might have gone wrong with Thomas edition, reading in the published Variorum Viewer. Specifically look for words that got mushed together (faulty space errors that are signalling that there is a problem further back). Also note other kinds of problems, but for right now, we're really concentrating on space issues and mushed words.

I. Correct the display on the variorum (see below). II. See if we can trace the problem back to the input collChunk files in this fv-collation repo.

ebeshero commented 3 years ago

I. To correct the display on the Variorum

Go the fv-data repo.

Fix the spine!

Fix the edition display file:

Git commits

Try to add, commit, and push the change as one complete git commit per set of corrections. This will help us to review our "bugfixes" later in the git commit history.

Rules of thumb for handling white space errors

ebeshero commented 3 years ago

II. To diagnose the problem in the input collation files

am0eba-byte commented 3 years ago

thomas_s7_cloudsThis

we didn't actually make these changes yet because we got a bit lost and we don't want to mess anything up, so we are just documenting here what we would've done

On 6/24/21, Jackie and Mia went into Section 7, part 7, chunk 23a of the 1818 version of the text and fixed the whitespace issue of "clouds.This" difference from the 1823 issue. There needs to be a whitespace where the paragraph ending is in the 1818, so that the collation says that the 1818 and 1823 versions of this are the same. We looked into the fv-collation directory collatexPrep/collationchunks/1818fullFlat_C23a.xml

am0eba-byte commented 3 years ago

On 6/24/21, Jackie and Mia went into Section 7 of the MS collchunks Part 7 and pretty-printed msColl_C23c.xml to fix the display whitespace issue shown here:

MS_whitespace_section7_display

Then we went into spinec07 in fv-data and fixed the mushed-together words in the reading groups associated with those parts of the MS.

wdjacca commented 3 years ago

For MS Section 7, I noticed that the are white space issues consistent throughout the section due to the lack of a space before the </line>. I am working on fixing those but am now creating a list of the specific changes I made here:

intercession of [] Elizabeth We agreed [] perfectly For although [] was a great dissimilitude in our [] characters philosophical than my companion [] Yet longer endurance [] than hers endured [] my amusements [] were studying old books of chemistry but I had [] a friend who intimate friend [] of my father I remember [] when he was only nine years old he [] wrote a fairy tale which was the [] delight and amazement of all his [] companions when very young, I can [] remember that we used to [] act plays

I noticed the lines that I saw on the viewer is different from what I can see in the files here, so I stopped making changings for this chunk.

ebeshero commented 3 years ago

Thank you @am0eba-byte and @wdjacca ! Okay, so the display of the Manuscript pages doesn't actually come from our XML inside the project. It's being pulled in from The Shelley-Godwin Archive (S-GA), and it's definitely being handled differently from the other editions.

What our project is doing is, we process the (S-GA) files as input for our collation data. We use that data to help us block out the variant information (to give us our "hotspot" CSS for the inline pages and to give us our side-boxes). But for the display of the inline Manuscript pages, that is actually being imported directly from the S-GA. That is why, if you update the S-GA XML files, it really doesn't matter for the output we can see. (That's why, @wdjacca , you didn't see the same issues when you looked at the XML). One thing that makes display of S-GA complicated IS that it has a lot more markup in the XML--and we do need to modify the way that gets displayed, but we'll do that with help from @raffazizzi later!

The other XML files for the other four editions DO get displayed directly (1818, Thomas, 1823, and 1831), so those are the ones where we can correct our display from the "inside" as we're doing.

ebeshero commented 3 years ago

@am0eba-byte and @wdjacca I'm inspecting the passage you were looking at more closely in the Variorum reader, and I'm seeing something pretty interesting there! Notice where you were circling in red the mushed-up words in the screen capture, that whole area is DEVOID of hotspots! That's because it's not worked into the collation! There are two long-ish stretches in MS Chunk 7 like that--lots of text, with (apparently/supposedly) nothing comparable in the other editions.

If we look for the roughly corresponding areas over in 1818 chunk 7, what I see is a long stretch of "MS edition is missing here." It's possible that what we're seeing is just Mary Shelley taking a nip tuck in her MS notebook and not including long stretches, but....that should be something we SHOW in the variant collation data. It's definitely something we need to look at more closely, but not something to spot correct in our simpler task here! Thank you for bringing it to my attention--because I'll open a new issue about EXACTLY this--and we can keep track of when we see it.