FrankensteinVariorum / fv-collation

first-stage collation processing in the Frankenstein Variorum Project. For post processing and Variorum development, see our GitHub organization: https://github.com/FrankensteinVariorum
https://frankensteinvariorum.github.io/fv-collation/
GNU Affero General Public License v3.0
9 stars 2 forks source link

Collation with SGA Notebooks #23

Closed ebeshero closed 6 years ago

ebeshero commented 7 years ago

@raffazizzi I've just been looking through the TEI Lite and HTML files from SGA, and spent some time thinking about the collation and chatting with @djbpitt about it just before my trip to Victoria. (Also chatted with @wendellpiez a little yesterday too before leaving the conference.) After talking with them, I'm thinking we ought to try doing something with those interesting deleted stretches after all. If you can push the diplomatic TEI, I can try "boiling it down" to deletions and adds (and removing other kinds of markup that aren't useful to the collation).

I think we can write a little XSLT to extract what we want--and see how it goes. I know it's a little more complicated, but I think we can handle it and might show us something interesting...I'm doing some collation of deleted passages in ms with print editions on the Digital Mitford project, too. What we can do is output some symbols indicated deleted stretches...@djbpitt suggests converting the del and add tags to milestones prior to collation so we can readily turn them into symbols in the app crit markup. Shall we give it a whirl?

ebeshero commented 7 years ago

The point David made that convinced me was, that pointing into the notebooks will make the deletions visible (as we were planning), but not available for critical analysis with the rest of the text we have. Since we have them as digital text, we should try to work them in, and it'll help the collation be as revealing as possible.

raffazizzi commented 7 years ago

@ebeshero the diplomatic TEI is here: https://github.com/umd-mith/sga/tree/master/data/tei/ox Look for MS Abinger C 56, 57, and 58.

Happy to try any approach. In general, what I like about linking rather than replicating text in the collation is that we wouldn't need to make any decisions about how to transform and restructure the text for others. If I understud @djbpitt's point correctly, he suggests to make sure these structures are available to those who want to use the collation for analysis. I think a pointer to the right place in the full diplomatic encoding is more useful than a simplified version of the same snippet of text. It requires more work on the implementer to follow the link and parse the result, but they can tailor that parsing to their own research goals instead of us doing that for them out of our own assumptions.

On the other hand, if it makes sense for our process to have a simplified version of the deletions, then we should do that. I think the key here is better defining what the collation is about. Is it an instrument of discovery for us, on the way to another object (e.g. a variorum or a critical edition), or a product in and of itself that should be ready for future use by others?

ebeshero commented 7 years ago

@Rikkm After the Hangout with @raffazizzi this evening, I learned some important things about the SGA notebook files, and how to put them together...also how we'll need to be working with the line-by-line encoding. Remember how we thought we'd rip out the line elements? Nope. We'll keep them but turn them into self-closing <lb/> elements. And I'll add xml:id's to those: Raff's going to need that info to help locate the strings of text we're collating from the notebooks with specific portions of the pages. So--let's talk more about this tomorrow!