Closed lcerrato closed 3 years ago
Hi, Lisa. Thanks! If you could append some xml here with the guilty texts, that would be great. Is it possible that these have been run through a script that removes Latin-script letters? In that case, errant capital-e in place of capital-epsilon would get stripped.
I forget which one it was yesterday because I wasn't tracking it but I saw it in urn:cts:greekLit:tlg4026.tlg003.1st1K-grc1 today
(It's possible tidy cleaned it up but these were capital pi and delta so not the type of letters that would be confused.)
I wasn't tracking it carefully (I will from now on), but here is one passage
[a 16] εῖ δὲ παρὰ ταῦτα μηδένα ἄλλον τρόπον ἐρωτήσεων τῶν
https://archive.org/details/commentariaina21pt12akaduoft/page/331/mode/1up?view=theater
Ok, this one was definitely tidy — as the original file was
<tei:div type="textpart" subtype="1" n="urn:cts:greekLit:tlg0557.tlg004.1st1K-grc1:2"><tei:p>2. Παρὰ θεῶν μὴ συνεχῆ
versus
<div type="textpart" subtype="section" xml:base="urn:cts:greekLit:tlg0557.tlg004.1st1K-grc1" n="2">
<p rend="indent">2. αρὰ θεῶν μὴ συνεχῆ
very interesting.
Phew. So unlikely to be a Lace problem, but if you can append the original file here, I can check if my MacOS tidy makes this error. We should file a bug against tidy for sure. However, in the long run, we can use XSLT to do all your postprocessing, including indenting and that should be more reliable.
I'll leave this open until we're certain it's not a Lace issue.
Ok, I freshly generated urn:cts:greekLit:tlg0557.tlg004.1st1K-grc1 and the uppercase pi is still there. I processed the file with Linux parallel tidy -xml -m -i {} ::: *xml and the result also has the uppercase pi.
This was a recent batch ldpd_10922736_000.zip
Hi, Lisa. ldpd_10922736_000.zip is the set of files I based my comments above. When I generated them at heml.mta.ca/lace I did not see these problems in the Lace output, but rather post tidy (in macos).
As for the initial example, I find that the Δ is missing in the editing, as shown in this image
I've asked Charlotte to do a last scan of commentariaina21pt12akaduoft as up at heml.mta.ca/lace
@brobertson Yes, I agree on this. I just could not be sure in the middle of the workflow (have to go back and redownload and compare, etc.). Some of these were near those Aristotle brackets so I thought that could have been creating some noise at first.
Issue resolved because it was not caused by Lace, but either by erroneous editing or post-processing in MacOS 'tidy'.
Just a heads up — I've spotted a few recent texts where the initial capital letter is missing from a word. This may be isolated to specific circumstances such as the texts I'm working on, but I've seen it in two different volumes this week.