brobertson / Lace2

In-broswer OCR editing program that transforms OCR results into structured, citable TEI. No XML experience required!
http://trylace.org
GNU General Public License v3.0
27 stars 2 forks source link

Hades spelling convention #150

Open helmadik opened 1 year ago

helmadik commented 1 year ago

Hi there! Since I see mention of verified spellings in this package, perhaps a long shot here: I always find myself correcting forms of Hades in the Greek OCR-ed texts. Greek long alpha followed by iota would be ᾳ in lower case, but the conventional spelling in upper case has a tiny adscript next to the capital. Check out the number of characters in this spelling: ᾍδης (4) vs what's typically found in the Open Greek and Perseus OCR files: Ἅιδης (5). If this were a real adscript following a short alpha, the diacritics would be on the iota.. (and we would call him Haedes:-)) My morphological analyzer catches these, but it would be great to correct it at the source. Many thanks!

helmadik commented 1 year ago

Similarly, I'm seeing quite a few ἢδη and ἣκω. Prob. depends on people's choice of font etc. how easy this is to catch in corrections. Finally, I've added a standard check for ς[;·] - punctuation often an OCR artefact following final sigma. [I should mention I have been working on older output -Lucian, Plutarch - so this may all be fixed by now!]