AjaxMultiCommentary / kodon

A minimal-computing multi-commentary platform
MIT License
0 stars 0 forks source link

Make `text` property of ReadableTextContainer an `Array<string | TextContainer>` (?) #12

Open pletcher opened 2 months ago

pletcher commented 2 months ago

This will enable block-level elements to nest properly.

Consider this excerpt from Pausanias (tlg0525.tlg001.perseus-grc2):

<div type="textpart" subtype="section" n="2">
    <p>
        <milestone unit="para" ed="P"/>
μετὰ δὲ τὴν ἐν Ἰταλίᾳ πληγὴν ἀναπαύσας τὴν δύναμιν προεῖπεν Ἀντιγόνῳ πόλεμον, ἄλλα τε ποιούμενος
        <milestone unit="page" n="0" ed="Spiro"/>
ἐγκλήματα καὶ μάλιστα τῆς ἐς Ἰταλίαν βοηθείας διαμαρτίαν. κρατήσας δὲ τήν τε ἰδίαν παρασκευὴν Ἀντιγόνου καὶ τὸ παρʼ αὐτῷ Γαλατῶν ξενικὸν ἐδίωξεν ἐς τὰς ἐπὶ θαλάσσῃ πόλεις, αὐτὸς δὲ Μακεδονίας τε τῆς ἄνω καὶ Θεσσαλῶν ἐπεκράτησε. δηλοῖ δὲ μάλιστα τὸ μέγεθος τῆς μάχης καὶ τὴν Πύρρου νίκην, ὡς παρὰ πολὺ γένοιτο,        <add>τὰ</add> ἀνατεθέντα ὅπλα τῶν Κελτῶν ἐς        <del>τε</del> τὸ τῆς Ἀθηνᾶς ἱερὸν τῆς Ἰτωνίας Φερῶν μεταξὺ καὶ Λαρίσης καὶ τὸ ἐπίγραμμα τὸ ἐπʼ αὐτοῖς·        <quote type="inscription">
            <l met="dact">τοὺς θυρεοὺς ὁ Μολοσσὸς Ἰτωνίδι δῶρον Ἀθάνᾳ</l>
        </quote>
    </p>
</div>
<div type="textpart" subtype="section" n="3">
    <p>
        <quote type="inscription">
            <l met="dact">Πύρρος ἀπὸ θρασέων ἐκρέμασεν Γαλατᾶν,</l>
            <l>πάντα τὸν Ἀντιγόνου καθελὼν στρατόν. οὐ μέγα θαῦμα·</l>
            <l>αἰχματαὶ καὶ νῦν καὶ πάρος Αἰακίδαι.</l>
        </quote>τούτους μὲν δὴ ἐνταῦθα, τῷ δὲ ἐν Δωδώνῃ Διὶ Μακεδόνων ἀνέθηκεν αὐτῶν τὰς ἀσπίδας. ἐπιγέγραπται δὲ καὶ ταύταις·        <quote type="inscription">
            <l met="dact">αἵδε ποτʼ Ἀσίδα γαῖαν ἐπόρθησαν πολύχρυσον,</l>
            <l>αἵδε καὶ Ἕλλασι                <add>ν</add> δουλοσύναν ἔπορον.
            </l>
            <l>νῦν δὲ Διὸς ναῶ ποτὶ κίονας ὀρφανὰ κεῖται</l>
            <l>τᾶς μεγαλαυχήτω σκῦλα Μακεδονίας.</l>
        </quote>
        <milestone unit="para" ed="P"/>
Πύρρῳ δὲ Μακεδόνας ἐς ἅπαν μὴ καταστρέψασθαι παρʼ ὀλίγον ὅμως ἥκοντι ἐγένετο Κλεώνυμος αἴτιος,
    </p>
</div>

Section 1.13.3 begins in the middle of the quoted inscription, and the TEI markup has split it into two <quote> blocks.

We want the TextContainers to look something like this:

[...,
{ offset: n,
  location: ["1", "13", "2"],
  text: ["μετὰ δὲ τὴν ἐν Ἰταλίᾳ πληγὴν ...", { subtype: "quote", text: [{ subtype: "line", text: "τοὺς θυρεοὺς ὁ Μολοσσὸς Ἰτωνίδι δῶρον Ἀθάνᾳ" }] },
{ offset: n + 1,
  location: ["1", "13", "3"],
  text: [{ subtype: "quote", text: [{ subtype: "line", "text": "Πύρρος ἀπὸ θρασέων ἐκρέμασεν Γαλατᾶν," }, ...]
},
...
]

The problem is that the lines in the quotations are not addressable by CTS URN, which breaks our current system of storing and applying annotations via URN.

It's basically trivial to do a depth-first reassembly of the plain strings, but then we lose the block-level definition that's encoded in the TEI.

Is there a syntax we can adopt to append to the locations that says, "Go to this block-level element in the text tree of this location"?

Or is there a better way to solve this issue?

pletcher commented 2 months ago

Another approach is to allow duplicate locations and order the blocks by offset:

[...,
{ offset: n,
  subtype: "paragraph",
  location: ["1", "13", "2"],
  text: "μετὰ δὲ τὴν ἐν Ἰταλίᾳ πληγὴν ... },
{ offset: n + 1,
 location: ["1", "13", "2"],
 subtype: "quote", 
 text: ""
},
{ offset: n + 2,
  location: ["1", "13", "2"],
  subtype: "line",
  text: "τοὺς θυρεοὺς ὁ Μολοσσὸς Ἰτωνίδι δῶρον Ἀθάνᾳ"
},
// whenever the location changes, close all open tags?
{ offset: n + 3,
  subtype: "paragraph",
  location: ["1", "13", "3"],
  text: "",
},
{ offset: n + 4,
  location: ["1", "13", "3"],
  subtype: "line", 
  text: "Πύρρος ἀπὸ θρασέων ἐκρέμασεν Γαλατᾶν,"
},
{ offset: n + 5,
  location: ["1", "13", "3"],
  subtype: "line", 
  text: "πάντα τὸν Ἀντιγόνου καθελὼν στρατόν. οὐ μέγα θαῦμα·"
},
...
]

But this presents problems as well, as it's not always clear when a block-level element should be closed.