PerseusDL / canonical-greekLit

XML Canonical resources for Greek Literature
https://scaife.perseus.org
Creative Commons Attribution Share Alike 4.0 International
100 stars 93 forks source link

(tlg0062-varia) overview #1467

Open gregorycrane opened 1 year ago

gregorycrane commented 1 year ago

I am going to break up new work on Lucian into multiple pull requests. There is too much for just one.

A lot of work has gone into cleaning up and augmenting Lucian. This post summarizes that.

  1. We have two versions of the Harmon Loeb volumes of Lucian (vols 1-5): one in Perseus and one in 1st1k. The second 1st1k version was to replace the earlier version, which was based on early OCR. In fact, the earlier version has been corrected to a point where it was better than the 1st1k version.

    1. I compared the Perseus Greek with 1st1k to help find errors. These automatically fixed errors are now in tags. Two thirds of these are (1) raised dots that OCR missed and (2) accent issues (usually failure to keep an acute on the ultima when the following word was enclitic).
    2. I have extracted the notes from the 1st1k version and added these to the Perseus version, correcting many (but surely not all) errors in the notes.
  2. We have digitized the accompanying Harmon translations. These cover works 1-52 (out of 71 in the TLG canon).

  3. We have added the Greek and English by Kenneth Kilburn by Matthew Donald Macleod.

  4. We have added the English translations that the brothers Francis George Fowler and Henry Watson published in 1905. These translations leave out some works that are not considered to be by Lucian as well as some works that they considered morally objectionable.

  5. We have added the translations Emily James Smith for selected works of Lucian (with a number of passages silently omitted because of their content).

  6. We have added a translation of the De Dea Syria by Herbert Augustus Strong and John Garstang, both because of its own substantial notes and because Harmon translated this work into his representation of an earlier form of English.

  7. We have added an English version of A. J. Pons translation of the Dialogues of the Courtesans (a work that others had only partially translated because of its content).

  8. We have also added Alex Hillman's translation of this work (which he lists as "Mimes of the Courtesans").

  9. We have added Howard William's translation of a number of dialogues for Bohns Library. This edition is particularly helpful because of its extensive notes. These notes need editing for a modern audience (e.g., they cite Greek works with Greek titles) and citations to primary works need to be tagged.

Other works to add include:

  1. The Latin translations (mainly by Hemsterhuys) printed in the Didot/Dindorf edition. The OCR for this Greek text is fairly good and it might be worth including this also.

  2. The textual notes for the Jacobitz edition. OCR is not bad and would not take too much work to clean up.

  3. The Jacobitz editions for works 1-52. For now, we have used Jacobitz to supplement the Harmon editions for 1-52.

  4. The Rabe edition of the Lucian scholia are already in Open Greek and Latin. The lemmas should be linked to the Greek and the annotations of this text integrated with the original.

  5. The Allinson edition and commentary should be added.

lcerrato commented 1 year ago

see also https://github.com/OpenGreekAndLatin/First1KGreek/issues/2750 https://github.com/PerseusDL/canonical-greekLit/issues/1460

lcerrato commented 1 year ago

@gregorycrane My recommendation for this or the future:

I can make incremental changes to fix the current PR such as one type of change per push, but I need to know that you are not working on this at the same time.

AlisonBabeu commented 1 year ago

hey @lcerrato and @gregorycrane I have to second the recommendation for not doing massive pull requests of this size. After a few years of working on the Perseus and OGL collections with Lisa, I particularly want to second committing translations and editions in separate batches, because if there are errors, tagging issues, metadata misses, etc. its much easier to spot them that way.

lcerrato commented 1 year ago

I think if there is one or two types of global changes (fixing all of the epidoc links or fixing an author name) then a massive PR can work as you can quickly see the change(s), but any extensive deep edits need to be limited in size. I can't even really scroll through the review page. I know that each pass @AlisonBabeu does on a batch is usually 1-3 types of edit on very limited data. The original OGL recommendations were one file per PR. We abandoned that, but for deep edits, I see the point. There are things here that are wrong but won't be caught in testing (like the xml:base issue). The xml:base as added to the English files is not going to provide any useful info and is very easy to miss in review.