OpenGreekAndLatin / First1KGreek

XML files for the works in the First Thousand Years of Greek Project. Please see our Wiki on how to contribute.
https://opengreekandlatin.github.io/First1KGreek/
Creative Commons Attribution Share Alike 4.0 International
92 stars 85 forks source link

Representing Parallel Fragments in tlg2042/tlg016, tlg028, and tlg030 #286

Open jduff-chs opened 8 years ago

jduff-chs commented 8 years ago

These texts, of different works by Origenes, each contain fragmentary attestations of portions of a work in both Latin and Greek. For example, see the page below, from tlg2042/tlg016, the homilies on Luke. The (mostly complete) text of the surviving Latin translation is in the left column, and the extant (more fragmentary) original Greek portions are in the right column, aligned so that they are matched with the Latin.

image

I'd imagine we'd want to separate these into Greek and Latin files, but it seems like we'll encounter some issues in this. For one, I can't think of a way to encode these correspondences. Another issue: the line numbers on the left of every page are meant to cite both the Greek and Latin, but only turn up in the XML on the Latin, as that's the text they are closest to. This ends up being an even greater issue in tlg030, a commentary on Matthew.

image

Here, Greek is on the left and Latin on the right, and both are equally fragmentary. Again, here, all line numbers are recorded in the left column's text, here the Greek. But this means that where the Greek is absent but Latin is present, there are line numbers meant to refer to Latin text, stranded in large lacunae in the midst of the Greek.

Does anyone have ideas for how we might encode the correspondence between two files while still splitting them? (is the correspondence worth preserving?) Any ideas (short of manual entry) for how to apply the line numbering to both texts? (is full line numbering important?)

N.B.: Someone was in the middle of separating the two texts in tlg030: tlg030...grc1 contains only the Greek text as far as I've checked, but tlg030...lat1 is the full text, both the Latin and the Greek portions. Neither currently passes Hook, because of duplicate sections.

jduff-chs commented 8 years ago

For reference, here is how they are currently encoded:

Page 4 (left) of 016:


<pb n="v.9.p.4"/>
<p>recepti. Et ut seiatis non solum
quatuor evangelia. sed plurima
esse conscripta, e quibus haec,
quae habemus, electa sunt et <milestone unit="altpage" n="87"/> 
<lb n="5"/>tiadita ecclesiis, ex ipso proocemio
Lucae, quod ita contexitur, cognoscamus:
&gt; Quoniam quidem
multi conati sunt ordinare narrationem&lt;
Hoc quod ait: &gt;conati
<lb n="10"/> sunt&lt;, latentem habet accusationem
eorum, qui absque gratia
Spiritus sancti ad scribenda evangelia
prosiluerunt. Matthaeus
quippe et Marcus et Joannes et
<lb n="15"/> Lucas non sunt &gt;conati&lt; scribere,
sed Spiritu sancto pleni scripserunt
evangelia. &gt; Multi&lt; igitur
&gt; conati sunt ordinare narrationem
de his rebus, quae manifestissime
<lb n="20"/> festissime cognitae sunt in nobis &lt;
Ecclesia quatuor habet evangelia,
ζῖται &lt; οὐ πάντα ἐνέκριναν, ἀλλα τινα
αὐτῶν ἐξελέξαντο.</p>
<p>Τάχα δὲ καὶ τὸ &gt;ἐπεχείρησαν &lt;
λεληθυῖαν ἔχει κατηγορίαν τῶν προπετῶς
καὶ χωρὶς χαρίσματος ἐλθόντων
ἐπὶ τὴν ἀναγραφὴν τῶν εὐαγγελίων.
Ματθαῖος γὰρ οὐκ &gt;ἐπεχείρησεν&lt;
ἀλλ᾿ ἔγραφεν ἀπὸ ἁγίου
πνεύματος, ὁμοίως καὶ Μᾶρκος καὶ
Ἰωάννης, παραπλησίως δὲ καὶ
Λουκᾶς.</p>

Page 83 (right) of 030:


<pb n="v.10.p.83"/>
<p>προφητῶν εἰρηκότος »ζῶ ἐγώ«, ***
<lb n="5"/>
καὶ***
<lb n="10"/> ἐμὲ ἐγκατέλιπον, πηγὴν ὕδατος
ζῶντος«. καὶ ζωὴ δὲ ὡς ἀπὸ πηγῆς
ζωῆς τοῦ πατρὸς ὁ εἰπών· ἐγώ
εἰμι ἡ ζωή«. καὶ πρόσχες ἐπιμελῶς
εἰ μή, ὥσπερ οὐ ταὐτόν ἐστι
<lb n="15"/> πηγὴ ποταμοῦ καὶ ποταμός, οὕτως
πηγὴ ζωῆς καὶ ζωή.
καὶ ταῦτα δὲ προσεθήκαμεν διὰ
τὸ προσκεῖσθαι τῷ σὺ εἶ ὁ Χριστὸς
ὁ υἱὸς τοῦ θεοῦ τὸ
<lb n="20"/> τοῦ ζῶντος· ἐξαίρετον γάρ τι
ἐχρῆν παραστῆσαι ἐν τῷ λεγομένῳ
περὶ τοῦ θεοῦ καὶ πατρὸς
τῶν ὅλων, ὡς ζῶντος παρά τε
τὴν αὐτοζωὴν καὶ τὰ μετέχοντα
<lb n="25"/> αὐτῆς. ἐπεὶ δὲ εἴπομεν μὴ ἀπό
τινων ὑγιῶν δογμάτων εἰρηκέναι
τοὺς ἀποφηναμένους εἶναι τὸν ᾿Ιησοῦν
Ἰωάννην τὸν βαπτιστὴν
ἤ τινα τῶν ἐπιφερομένων, κατασκευάσωμεν
<lb n="30"/> καὶ τοῦτο λέγοντες ὅτι,
εἰ παρατετεύχεισαν ἐπὶ τὸ βάπτισμα
ἀπεληλυθότι τῷ Ἰησοῦ πρὸς τὸν</p>
<p>et forsitan ideo dicebatur vivus,
secundum eminentiam qua supereminet
omnibus habentibus in
se vitam, quoniam et »solus habet
« et est fons vitae.
et fons quidem vitae proprie dicitur
deus pater, qui dicit per Hiere-
miam:
»me dereHquerunt fontem aquae
vivaea. vita autem est quasi de
fonte vitae patris procedens, qui
dixit: »ego sum vita«. et vide
quoniam sicut non est idipsum
fons &lt;fluvii&gt; et fluvius, sic non
idipsum fons vitae et vita.</p>
annettegessner commented 8 years ago

It looks like we have to think carefully about how to cite these fragmentary textparts BEFORE we split them into Greek and Latin AND we have to think about ways of referring to the original text of the Bible, so this is a matter of "How to handle translations AND commentaries AND fragments" - whew!

ChiaraPalladino commented 8 years ago

To perform automatic alignment tests on this one would be great. It would provide a lot of training data for Tariq's algorithm. It would be great if you could provide us a couple of pages from this text where the two versions are "aligned" at sentence level (either in txt or xml format it doesn't make any difference): that is, you use common references for the corresponding sentence in the two languages, or you could just plainly put them on two subsequent lines if you are working in txt. With this initial data we could run some preliminary tests to see how it works. If we have no problems, then we can work at paragraph level, which is the ideal situation: the single paragraphs of the corresponding versions should then be encoded with common references. The fact that the two versions are already in the same file is actually much better for us!

jduff-chs commented 8 years ago

@annettegessner, agreed! I suppose the <q type='mentioned' corresp='*URN*'></q> method as discussed in #201 will work throughout for tying these homilies to the passage they discuss!

@ChiaraPalladino, hello! I'll work on aligning a few pages of tlg016, the homilies on Luke, for you presently! We are working in XML, and the alignment referencing I'm used to working with is that used by Alpheios. Would something like this work for you?

<aligned-text xmlns="http://alpheios.net/namespaces/aligned-text">
    <language lnum="L1" xml:lang="grc"/>
    <language lnum="L2" xml:lang="lat"/>
    <sentence n="1" lnum="L1">Ἀλλ᾿ οἱ μὲν Ἰουδαῖοι, ὡς
ἄξιοι τοῦ ἐπὶ τῇ καρδίᾳ αὐτῶν
καλύμματος, ἐψευδοδόξουν περὶ τοῦ
Ἰησοῦ.</sentence>
    <sentence n="1" lnum="L2">Et Ιudaei quidem faciebant
de Christo aestimationes dignas
velamine quod positum erat super
cor eorum.</sentence>
</aligned-text>
ChiaraPalladino commented 8 years ago

Hi Jack, yes, that is perfect and very clear. Looking forward to seeing your work then...thank you very much!

jduff-chs commented 8 years ago

I've sent an email with the alignments of the first six pages, and would be happy to align more tomorrow! Let me know if anything else will be of use.

PonteIneptique commented 8 years ago

Just a note as I pass by : it is also quite important to replace the &gt;conati <lb n="10"/> sunt&lt; by the matching apparatus criticus markup :) ie <add>conati <lb n="10"/> sunt</add>

jduff-chs commented 8 years ago

Ah! That's good to know, thank you @PonteIneptique, I've encountered them often. It's an unfortunate fact that in these Hinrichs editions, the > and < marks are often mistranscribed and reversed, and on many occasions missed altogether in the OCR, so that I'd be hesitant to do a full regex to replace them as <add></add>. There's also a complication in that the editors seem to use both text >addition< text and text <addition> text, and don't explain any difference in meaning between these two styles.

I'll open up an issue on this, and find a way to address these problems!

byeats commented 6 years ago

@jduff-chs Fresh CHS intern here. Did you ever reach a conclusion about how to preserve the correspondences between both the Greek and Latin fragments for these Origenes texts? @annettegessner @lcerrato @sonofmun Do you have any ideas about how to retain these correspondences?

lcerrato commented 6 years ago

@byeats Hi! Sorry, I was not following the original discussion am not up to date on this work. It looks like there are some circular references here to other issues, but I am not sure. I'm afraid I don't have any advice for handling this as it looks like there was going to be external alignment using another tool.