PerseusDL / canonical-greekLit

XML Canonical resources for Greek Literature
https://scaife.perseus.org
Creative Commons Attribution Share Alike 4.0 International
100 stars 94 forks source link

tlg0086.tlg034.perseus-grc2.xml -- a version of this is now on Github #1683

Open gregorycrane opened 1 week ago

gregorycrane commented 1 week ago

I spent a fair amount of time updating the old beta code version of this file, which is the Kassel edition that we have used since Perseus 1.0.

@lcerrato @AlisonBabeu I would like to ask for help finishing this as I have had trouble with hooktest on my machine. The file is modeled on tlg0086.tlg034.digicorp-grc2.xml and it parses.

Summary of work below. I was able to add 319 or 369 section breaks by hand and did the remaining 50 by hand. The sections should be checked at some point.

Summary from the XML: (1) updates to unicode (with pargraphs etc. properly encoded) and to updated epiDoc compliant TEI XML; (2) square brackets and angle brackets have been converted to TEI add and del markup (which may not work in our current CSS); (3) imported the chapter/section breaks from the Bekker edition--these should now be the dominant citation scheme with Bekker pages being accesible from a secondary table of contents in Perseus 6; (4) used the same apostraphe character as in the digicorpus-grc2 version. The addition of chapters and sections to Kassel should be noted prominently.<

lcerrato commented 1 week ago

@gregorycrane

None of the Perseus Aristotle uses Bekker as a container, except in modified form. This was one reason the conversion was left until the end of the workflow.

lcerrato commented 1 week ago

See file here https://github.com/gregorycrane/Poetics2.0/blob/main/grc/tlg0086.tlg034.digicorpus-grc2.xml @AlisonBabeu @gregorycrane If the changes are significant enough, I would propose renaming. If this is a derivative from Digicorpus, it should have a new ID that removes the digitcorpus from the file name. I purposefully did not make any edits to their editions.

gregorycrane commented 1 week ago

This file in the Poetics2.0 repo is identical except for encoding to what is in https://github.com/PerseusDL/canonical-greekLit/blob/master/data/tlg0086/tlg034/tlg0086.tlg034.digicorpus-grc2.xml. Only the encoding of Greek accents has been updated. So just replace the existing file. I have seen and fixed the non-standard accent encoding before we released it.

gregorycrane commented 1 week ago

I am not messing with the First1KGreek or canonical-greekLit repos directly if I can help it -- it takes too much time for me to figure out again how to do the hooktests and mess with it. The best approach is for people who use that system (Alison and Lisa) to do the final mile.

lcerrato commented 1 week ago

In tlg0086.tlg010.perseus-grc2.xml there are numerous inconsistencies in the file header and encoding. Things like "digicorpus" appear throughout. If this is the Perseus Greek, the header is incorrect. If this is something other than the Perseus Greek, please let me know.

Things on the to do list: I can complete but have questions.

  1. "The addition of chapters and sections to Kassel should be noted prominently." What should be noted? If there has been information added that is not in print, then resp="perseus" is used. Please clarify what should be applied. 2.<del> and <add> are found throughout the collection. daggers are generally removed as well, and should be <sic>
  2. <milestone unit="para"/> is not used.
  3. You have section instead of Bekker page. <milestone unit="page" resp="Bekker" n="1094a"/><milestone unit="line" resp="Bekker" n="1"/> is the recommendation.
  4. Shown below would not be recommended. The section and line should go with the start of the container, not the end. (As noted above in number 4, the para would be applied as an indentation <p rend="align(indent)"> if at all.)
    
    <milestone unit="section" n="1448a"/>
    <milestone unit="bekkerline" n="1"/>
    <milestone unit="para"/>
    </p></div>
    </div>

This last issue is common with Perseus conversions and is one of the messy fixes.

I have bumped the perseus-grc file.