PerseusDL / canonical

This will be the base repo for all text and annotation data published in the PDL
16 stars 17 forks source link

Helma Dik's version of Herodotus Histories #65

Closed balmas closed 9 years ago

balmas commented 10 years ago

This is very long overdue, but this pull request is for the changes @helmadik has made to clean up Herodotus Histories.

I did a few things to make the comparison with the Perseus canonical P4 version a bit easier:

  1. the first commit on this branch are just some cleanup changes to the P4 version, to convert divX to div and remove milestone[@unit='para']. I'm not sure whether I'm right, but it seemed to me that the para milestones are irrelevant now that every section is wrapped in a p tag already, and they were causing some erroneous differences due to placement of the milestone before or after the div breaks (probably caused by differences in way the texts were cts-zied)
  2. the second commit on this branch is just to clean up the formatting and indenting of the P4 canonical version, again to limit erroneous differences.
  3. the third commit on this branch is Helma's version, but it includes some changes that I made to it: 1) Helma added q tags around many of the quotes, and specified the who attribute to say who was speaking. Many of these crossed the section milestones which are part of the CTS hierarchy, so I ran a transform to close and reopen the q tags at the end and start of the intervening milestones. I had to the do the same for one quote tag too. 2) I CTS-ized the text, converting the section and chapter milestones to divs 3) Cleaned up formatting and indentation

I'm issuing the pull request because I think it might be good to try to incorporate Helma's improvements now before we port this text to EpiDoc as it will be easier to compare apples to apples.

helmadik commented 10 years ago

Great! I've made corrections mostly in the early books (except for the quotes), so if Lisa has recent changes to the text in later books, those I probably don't have.

On Fri, Jun 20, 2014 at 2:00 PM, Bridget Almas notifications@github.com wrote:

This is very long overdue, but this pull request is for the changes @helmadik https://github.com/helmadik has made to clean up Herodotus Histories.

I did a few things to make the comparison with the Perseus canonical P4 version a bit easier:

1.

the first commit on this branch are just some cleanup changes to the P4 version, to convert divX to div and remove milestone[@unit https://github.com/unit='para']. I'm not sure whether I'm right, but it seemed to me that the para milestones are irrelevant now that every section is wrapped in a p tag already, and they were causing some erroneous differences due to placement of the milestone before or after the div breaks (probably caused by differences in way the texts were cts-zied) 2.

the second commit on this branch is just to clean up the formatting and indenting of the P4 canonical version, again to limit erroneous differences. 3.

the third commit on this branch is Helma's version, but it includes some changes that I made to it: 1) Helma added q tags around many of the quotes, and specified the who attribute to say who was speaking. Many of these crossed the section milestones which are part of the CTS hierarchy, so I ran a transform to close and reopen the q tags at the end and start of the intervening milestones. I had to the do the same for one quote tag too. 2) I CTS-ized the text, converting the section and chapter milestones to divs 3) Cleaned up formatting and indentation

I'm issuing the pull request because I think it might be good to try to incorporate Helma's improvements now before we port this text to EpiDoc as

it will be easier to compare apples to apples.

You can merge this Pull Request by running

git pull https://github.com/PerseusDL/canonical hdik_herodotus

Or view, comment on, or merge it at:

https://github.com/PerseusDL/canonical/pull/65 Commit Summary

  • removing milestone=para and normalizing divs to prepare for compare with helma's version
  • cleaning up formating/indent
  • helma's version of herodotus - ctsized, para milestones removed and reformatted

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/PerseusDL/canonical/pull/65.

Helma Dik Department of Classics University of Chicago

lcerrato commented 10 years ago

Sorry: I don't know of a way to easily cross check the old cvs repo version texts/sdl with the version added to the canonical repo (none of the file history was preserved: shouldn't that be in the doc header?).

My last edits stop at 2/17/13 (cvs v1.7). Looks like a I have a few typos from v1.4 - 1.7. Shouldn't be hard to diff those if needed.

helmadik commented 10 years ago

Dear Lisa, I am a bad person. I would make changes in the original text, in our text with word id-s, and in our tokendb, and after those three (ugh! we are changing that system) I was too fed up to also note diffs in the header!

On Fri, Jun 20, 2014 at 2:53 PM, Lisa Cerrato notifications@github.com wrote:

Sorry: I don't know of a way to easily cross check the old cvs repo version texts/sdl with the version added to the canonical repo (none of the file history was preserved: shouldn't that be in the doc header?).

My last edits stop at 2/17/13 (cvs v1.7). Looks like a I have a few typos from v1.4 - 1.7. Shouldn't be hard to diff those if needed.

— Reply to this email directly or view it on GitHub https://github.com/PerseusDL/canonical/pull/65#issuecomment-46719626.

Helma Dik Department of Classics University of Chicago

lcerrato commented 10 years ago

Hi Helma, I disagree. In any case, we have two older versions of this work: one that was in cvs/texts/classics and one that was moved to cvs/texts/sdl (which is what is currently being deployed in P4). Unless I'm reading things wrong, I made no changes to this work since the move to GitHub. (I manually fixed the changes that were lost in the cvs transition from texts --> sdl). The P4 version of the work was last edited by me on 2/17/13. There were subsequent edits by Greg and Bridget having to do with some structural/CTS changes Greg made (and I believe Bridget had to roll back due to P4 deployment issues). My other changes are fully described in the file history: a missing sentence on 4.5.1; two typos in book 4; a stray mark mistaken for a full stop in 1.128.1. Does not appear to be anything we cannot quickly/easily recheck at this point.

helmadik commented 10 years ago

Cool. I could check your reported changes against my version of the text and report back.

On Fri, Jun 20, 2014 at 3:02 PM, Lisa Cerrato notifications@github.com wrote:

Hi Helma, I disagree. In any case, we have two older versions of this work: one that was in cvs/texts/classics and one that was moved to cvs/texts/sdl (which is what is currently being deployed in P4). Unless I'm reading things wrong, I made no changes to this work since the move to GitHub. (I manually fixed the changes that were lost in the cvs transition from texts --> sdl). The P4 version of the work was last edited by me on 2/17/13. There were subsequent edits by Greg and Bridget having to do with some structural/CTS changes Greg made (and I believe Bridget had to roll back due to P4 deployment issues). My other changes are fully described in the file history: a missing sentence on 4.5.1; two typos in book 4; a stray mark mistaken for a full stop in 1.128.1. Does not appear to be anything we cannot quickly/easily recheck at this point.

— Reply to this email directly or view it on GitHub https://github.com/PerseusDL/canonical/pull/65#issuecomment-46720564.

Helma Dik Department of Classics University of Chicago

lcerrato commented 10 years ago

cts:greekLit:tlg0016.tlg001.perseus-grc1:4.51.1 text missing between βορέω ... μεγάλης added μὲν ἀνέμου ὁρμᾶται, ἄρχεται δὲ ῥέων ἐκ λίμνης

cts:greekLit:tlg0016.tlg001.perseus-grc1:4.144.2
κτίζε --> κτίζειν

cts:greekLit:tlg0016.tlg001.perseus-grc1:4.197.1 ἐφόρτιζον--> ἐφρόρτιζον

cts:greekLit:tlg0016.tlg001.perseus-grc1:1.128.1 strike full stop after Μηδικοῦ

helmadik commented 10 years ago

I have now downloaded the file (I think the right one), have incorporated Lisa's edits in as far as they weren't there before and made the following additional changes: γλαῦκ > Γλαῦκ in 6.86.C.2 (caps for proper name at line start suppressed) Ταργιτάον > Ταργίταον twice: 4.5.1, 4.5.2 (following all editors outside Loeb) ἄγων καὶ ἄλλους > ἄγων ἄλλους τε 3.1.1 (again, no one prints this)

balmas commented 9 years ago

closing this because I think it has all been addressed now. @helmadik please reopen a new request if I'm wrong. Thanks!