FrankensteinVariorum / fv-postCollation

a repository for post-processing finalized collation files to prepare the Variorum edition.
2 stars 0 forks source link

fv-postCollation

This repository is part of the Frankenstein Variorum project. It contains a workspace for post-processing finalized collation files to prepare the Frankenstein Variorum digital edition. The pipeline of transformations in this repository yields the edition data incorporated in our static website for the Frankenstein Variorum project.

The workspace in this repo houses a transformation pipeline to prepare the TEI edition files and the TEI standoff spine for the Frankenstein Variorum. This README provides a summary of the files to run in order, with an explanation of each process. We began writing this documentation in 2018 and we have revised it as we fine-tuned the process and evaluated the outputs for problems. Since 2023 the development of the pipeline is completed and the files are stable, so we have now bundled these stages into an automated shell script designed to be run whenever we need to make a correction to the edition and re-run the collation. This documentation now provides detailed review of each stage of the process for others to adapt, or for us to modify as needed.

To develop this postCollation pipeline, we needed to find out how to “raise” XML elements that we had flattened to be read as text strings in the collation process. To read more about this process, see : Flattening and unflattening XML markup: a Zen garden of “raising” methods (slide presentation at Balisage 2018), and the published paper: Birnbaum, David J., Elisa E. Beshero-Bondar and C. M. Sperberg-McQueen. “Flattening and unflattening XML markup: a Zen garden of XSLT and other tools.” Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Birnbaum01.

This workspace also houses the "edit-distance" directory for work with calculating and visualizing pairwise edit-distance calculations for each variant passage in the Variorum. This directory stores work on generating our interactive summary heatmap visualization in SVG of the entire Variorum. The README for the edit-distance directory includes an explanation of the XSLT files and directories required to generate the interactive heatmap.

==================

Constructing the TEI digital edition after collation: the Pipeline

Phase 1: Convert collation data to TEI

Run P1-bridgeEditionConstructor.xsl

Phase 2: Generate distinct edition files

Run P2-bridgeEditionConstructor.xsl

Phase 3:

Phase 4: Raise the “trojan elements” holding edition markup

Option 1: Run P4-raiseBridgeElems.xsl

Option 2: Run P4Sax-raiseBridgeElems.xsl

Phase 5: Prepare and raise <seg> elements for variant passages in each edition

Run P5-Pt1-SegTrojans.xsl

Run P5-Pt2PlantFragSegMarkers.xsl

Run P5-Pt3MedialSTARTSegMarkers.xsl

Run P5-Pt4MedialENDSegMarkers.xsl

Run P5-Pt5raiseSegElems.xsl

Run P5-Pt6-spaceHandling.xsl

Phase 6

Run P6-Pt1-combine.xsl

Run P6-Pt2-simplify-chapAnchors.xsl

Run P6-Pt3-chapChunking.xsl

Run P6_SpineGenerator.xsl

Phase 7: Prepare the “spine” of the variorum

Run spineAdjustor.xsl

Run edit-distance/extractCollationData.xsl

Convert spineData.txt to ASCII format

Inside the edit-distance directory, run at shell python LevenCalc_toXML.py

Run edit-distance/LevWeight-Simplification.xsl

Run spine_addLevWeights.xsl