Closed ebeshero closed 6 years ago
@raffazizzi Here's my ID transformation stylesheet where we're adding the "page location flags" to each line: https://github.com/ebeshero/Pittsburgh_Frankenstein/blob/master/collateXPrep/Id_Trans_sgaMSLocators.xsl
And here's a sample of the input code (with S-GA <line>
elements) for reference:
https://github.com/ebeshero/Pittsburgh_Frankenstein/blob/master/collateXPrep/sga_Notebooks/msCollPrep_c57.xml
The S-GA markup doesn't have @xml:id
s on the lines, but instead they're on <surface>
and <zone>
and <anchor>
: we derived the value of @n
from those elements, and I stuck a number at the end based on count of preceding-sibling line. We can easily adjust this, but the inconsistencies, I think, have to do with whether the <line>
was inside a margin zone or in a main page zone.
@ebeshero you're indeed correct the issue had to do with which zone it came from. I made some adjustments to Id_Trans-sgaMSLocators.xsl to include the correct name of the zone (https://github.com/ebeshero/Pittsburgh_Frankenstein/commit/44f1d85243ae75f7bacd5900db52387c16e21aa0). I also adopted __
as separators instead of _
because the latter can occur in zone types (left_margin
).
Would you be able to run the script and regenerate collation chunks?
@raffazizzi Huzzah! Glad we figured out the issue--Okay, for me to re-run the collation will take several hours (seriously--it's a time-consuming process at this point with comparing all five editions for all 33 units). Would it help you if I sent you a selection of collation units in the next hour or two? (Or is it okay if you get the full collation again like tomorrow AM?)
Yes to both! I can work on this more tomorrow afternoon. I've been experimenting with chunk 15, but possibly any other chunk including manuscript material will work for me! Thanks :grin:
Ahh--sorry--just starting it now (= meeting ran long). I’ll run collation unit 15 first and push to GitHub.
@raffazizzi Just to be clear--right now do you ONLY need the collation chunks, rather than the collation itself to be re-run? Well, I'll push the chunks to be collated in a few minutes--for the entire novel!
I'll also start reprocessing a full collation so we have output, too. But I'll start that with just collation unit 15.
I need the collation itself so that I can work on converting rdgs with SGA content to rdgs with pointers to SGA.
@raffazizzi Got it... processing...stay tuned!
@raffazizzi I just a) ran your new identity transformation to set the new collation flags for S-GA, and then b) produced fresh collation units from that. Then I c) re-ran the collation chunking to make just C15, the one you were working on. That's all ready now from here: https://github.com/ebeshero/Pittsburgh_Frankenstein/tree/133011d3ba910bd01ca6326b02f3843cfde823b1/collateXPrep/Full_xmlOutput_C15
I'm starting the full collation now, and you'll have a fresh load of collated files to work with by tomorrow.
@raffazizzi There's a complete new collation set now for you to work with here: https://github.com/PghFrankenstein/Pittsburgh_Frankenstein/tree/Text_Processing/collateXPrep/Full_xmlOutput
Note: The collation output actually isn't quite complete yet. There's a little cluster of fragment files from s-ga that require special collation with the rest of the set. So, for example, for collation unit 20, there's a little fragment in the Bodleian c57 that is a reworking of some pages also in c57, so I generated it as a separate fragment file. That's true of four or five other collation units, and really c58 is another "frag" witness. I've prepared a special directory to generate another set of collations to work in these fragmentary witnesses together with all the others. These will have more rdg witnesses than the others--I'll prep the collation and run it later today.
@raffazizzi observes: some
<lb/>
elements have unpredictable @n attributes that probably come from the<line>
's xml:id. Can we switch to still generating a regular@n
and perhaps keep the@xml:id
as@xml:id
?For example:
<line>
--><lb n="ms-surface_zone_linenum" />
<line xml:id="ID"> --> <lb n="ms-surface_zone_linenum" xml:id="ID" />
@ebeshero: to investigate how we're generating these and take a look at the input: where's the inconsistency coming from? Are
@xml:ids
always present--and how are we deriving them? I'm thinking the inconsistency might be coming from the lines within marginal text.