FrankensteinVariorum / fv-collation

first-stage collation processing in the Frankenstein Variorum Project. For post processing and Variorum development, see our GitHub organization: https://github.com/FrankensteinVariorum
https://frankensteinvariorum.github.io/fv-collation/
GNU Affero General Public License v3.0
9 stars 2 forks source link

Polished Collation Notes and Checklist #62

Open ebeshero opened 5 years ago

ebeshero commented 5 years ago

Checklist

For each section, are reserved fragment files incorporated into collation output files? Have collation <app> elements been proofed and corrected for consistent semantic comparison of witnesses?

Notes

Deleted passages in Thomas: <del>

I am hand-correcting the collation output so that fully deleted passages in Thomas sit in one <app>. This is because one act of deletion is a semantic alteration to the text that is not properly recorded when the collation splits these across multiple app elements. In the ordinary output, when a deletion is longer than a word or two, we get a deletion start-marker in one app, followed by another app (or more) where all witnesses appear to be in unison before diverging again in a last app with the deletion end-marker. That obscures the nature of the change, so I'm unifying the full deletion and comparison in one <app>.

Status of <add> elements

The collation process has put <add> elements in the "ignore" list, so that their contents are consumed and output, but we don't see the <add> in any form from the collateX output.<add> elements were ignored to simplify the processing of the msColl. However(!) in ignoring the <add> elements the output collation now is missing information about a) which portions of the MS were inserted by Percy's hand, and b) the location of hand additions in the Thomas copy. I have included the Thomas <add> elements when I am patching in collation fragments by hand, that is, passages that did not align evenly (=the long passages where an insertion was indicated). In short, not all <add> tags are preserved, except those around the lengthier segments that I had to maneuver by hand into the collation. And the original <add> information is preserved, of course, in the S-GA files.

Options re <add> info from S-GA and Thomas:

White-space issues

I've been correcting white space issues in the output collation: where words sit at line-endings in Thomas_fullFlat (and full), for example, the collation smushes two words together as an alternate reading. Not good, but infrequent so far.

Pointers to all editions?

ebeshero commented 3 years ago

Yeesh. The reason for the smushed-together words in fThomas was suddenly evident when I returned to the project. It's just that the Thomas files weren't pretty-printed, so they lacked a starting space before a word at the beginning of a line. That messes up our tokenization in the collation process. Solution should be simple: just pretty-print Thomas...