FrankensteinVariorum / fv-collation

first-stage collation processing in the Frankenstein Variorum Project. For post processing and Variorum development, see our GitHub organization: https://github.com/FrankensteinVariorum
https://frankensteinvariorum.github.io/fv-collation/
GNU Affero General Public License v3.0
9 stars 2 forks source link

post-processing of empty tokens #79

Open ebeshero opened 2 years ago

ebeshero commented 2 years ago

@wdjacca @am0eba-byte and I have identified these distinct kinds of output involving spurious empty tokens ("",) generated by collateX. All of these are associated with <lb/> elements from the SG-A MS witness, whose contents we screened from comparison (so they are read, literally, by collateX as "").

A. An empty token is isolated in an <app> with a solitary <rdg> inside. The <rdg> only contains a <lb/> marker. B. An empty token is inside a "busy" <app> with multiple <rdgGrp> elements. Here, the "" token may or may not be the only thing making one <rdgGrp> different from the others.

  1. To resolve all of these cases, we think we should delete the empty tokens, wherever they are. Literally, remove "", from the @n attribute value holding the normalized tokens.

  2. If the "" token was in a "busy app" (case B.) then we need to compare the token list (following removal of "") to the token list in the other rdgGrps present on the app. 2a. If the edited token lists are now identical, we move the <rdg> from the "" rdgGrp to the matching <rdgGrp>. 2b. If the token lists are not identical, we preserve the <rdgGrp> structure (no change beyond deleting the "", from the @n attribute value).

  3. If the "" token is in an isolated <app> with a solitary <rdgGrp> containing ONLY the empty token, then we need to decide whether to move it to the first preceding or first following <app>. We decide this based on whether the immediately preceding or following apps represent a unison or "harmonious" <rdgGrp> in which all witnesses agree.

    • Default to moving the contents of the "" <rdg> to the corresponding <rdg> in a harmonious <app>.
    • Else in cases where both apps are harmonious or neither are harmonious, default to moving the contents of the "" <rdg> to the corresponding <rdg> in the first following <app>.