FrankensteinVariorum / fv-collation

first-stage collation processing in the Frankenstein Variorum Project. For post processing and Variorum development, see our GitHub organization: https://github.com/FrankensteinVariorum
https://frankensteinvariorum.github.io/fv-collation/
GNU Affero General Public License v3.0
9 stars 2 forks source link

XPath + Schematron scoping to enhance witness alignments #63

Open ebeshero opened 5 years ago

ebeshero commented 5 years ago

Singleton app problem:

CollateX tends to over-emphasize moments of exact alignment, cutting off interesting deletions/false starts that overlap with a passsage. So, collateX produces lots of <app> elements that signify complete agreement of witnesses, but before and after them are "singleton" <app> elements holding interesting snips that usually should be part of the alignment.

To scope for isolated segments that should be folded into comparative <apps> with multiple witnesses, go over each of the Full_PartX_xmlOutput files using the following XPath expressions:

//app[count(rdgGrp) = 1][count(descendant::rdg) = 1]

This won't be hugely helpful b/c it catches hundreds and hundreds of singleton apps that hold only escaped <lb> markers. To find those that contain substance, try:

//app[count(rdgGrp) = 1][count(descendant::rdg) = 1][contains(descendant::rdg, 'del')]

This finds singleton apps that contain "del", and will catch any that hold deletions.

Try hunting along the sibling axes to find where singleton apps are immediately preceded or followed by apps with one rdgGrp. (line-break added for visibility):

//app[count(rdgGrp) = 1][count(descendant::rdg) = 1]
[count(preceding-sibling::app[1]/rdgGrp) eq 1 or count(following-sibling::app[1]/rdgGrp) eq 1]

(We don't have to indicate that these apps must contain multiple witnesses b/c this is nearly always the case anyway).

To ignore those that contain flattened lb markers, try:

//app[count(rdgGrp) = 1][count(descendant::rdg) = 1]
[count(preceding-sibling::app[1]/rdgGrp) eq 1 or count(following-sibling::app[1]/rdgGrp) eq 1]
[not(starts-with(descendant::rdg, '&lt;lb'))]

Thomas copy long deletions:

Long deletions in Thomas constitute single semantic change across long portions of text. CollateX won't unify these but will just isolate their start and end moments on their <del> markers. To find moments in the Thomas text that hold deletions, use:

//app[count(rdgGrp) gt 1][contains(descendant::rdg[@wit="fThomas"], '&lt;del')]

When start and end <del> markers are separated by <app> elements, carefully compress the contents of those intermediary <app> elements into one long <app> that contains the complete Thomas deletion.

ebeshero commented 5 years ago

I've just converted this to a Schematron file to check everything...