chchch / upama

A PHP library for comparing two or more Sanskrit TEI XML files and generating an apparatus with variants
GNU General Public License v2.0
10 stars 1 forks source link

pāda boundaries in IAST and Devanāgarī #7

Open ppasedach opened 3 years ago

ppasedach commented 3 years ago

I wonder what is the best way of dealing with those kind of pāda endings in which you would have, in Devanāgarī typography, the closing consonant of the last word of the preceding pāda at the beginning of the compound, but which in IAST would be displayed on the end of the preceding line. For example (with spaces enclosed in for compound separators) :

<lg type="verse" xml:id="HV_47_1">
<l>saṅgrāma<pc> </pc>mūrdhni dalitāsura<pc> </pc>cakra<pc> </pc>vālām</l>
<l>ālokya tatra vikasat<pc> </pc>pulaka<pc> </pc>prabandhāḥ |</l>
<l>ābaddha<pc> </pc>gocara<pc> </pc>pariṣṭhita<pc> </pc>vāk<pc> </pc>prapañcaṃ</l>
<l>saṃtuṣṭuvur bhagavatīm iti siddha<pc> </pc>sādhyāḥ || 1 ||</l>
</lg>

vs

<lg type="verse" xml:id="HV_47_1">
<l>saṅgrāma<pc> </pc>mūrdhni dalitāsura<pc> </pc>cakravālā-</l>
<l>m ālokya tatra vikasat<pc> </pc>pulaka<pc> </pc>prabandhāḥ|</l>
<l>ābaddha<pc> </pc>gocara<pc> </pc>pariṣṭhita<pc> </pc>vāk<pc> </pc>prapañcaṃ</l>
<l>saṃtuṣṭuvur bhagavatīm iti siddhasādhyāḥ||1||</l>
</lg>

If displayed in the other script you get the following undesired effects at the boundaries of pāda a and b:

Selection_144

Selection_143

Would it be very hard to, have a pāda-initial single consonant, i.e. one followed by a space, attached to the last word of the preceding line or, the other way round, a pāda-closing vowel, if no daṇḍa or double daṇḍa follows, pulled to the beginning of the next line, with a hyphen added to the end of the preceding line, depending on the output script?

wujastyk commented 3 years ago

perhaps use instead of ? I haven't tested this.

ppasedach commented 3 years ago

I have now tried

<lg type="verse" xml:id="HV_47_1">
<l>saṅgrāma<pc> </pc>mūrdhni dalitāsura<pc> </pc>cakravālā<caesura/>m ālokya tatra vikasat<pc> </pc>pulaka<pc> </pc>prabandhāḥ|</l>
<l>ābaddha<pc> </pc>gocara<pc> </pc>pariṣṭhita<pc> </pc>vāk<pc> </pc>prapañcaṃ <caesura/>saṃtuṣṭuvur bhagavatīm iti siddhasādhyāḥ||1||</l>
</lg>

and the other way round (cakravālām <caesura/>ālokya). The result seems to be the same.

chchch commented 3 years ago

I've been using cakravālā<caesura/>m ālokya. It displays properly in Devanāgarī etc., and it also treats "cakravālām" as a lemma, since you don't split the "m" off. In this solution, is it the IAST display that you find ugly?

ppasedach commented 3 years ago

Yes, the IAST is ugly here. The m gets displayed as the first character of pāda b, with a space after that, just as in the IAST example above. I'd like to have something that for IAST pulls this m to the end of pāda a. But I can imagine that that's not so easy to implement.

chchch commented 3 years ago

Hmm, yeah it's not impossible, but it's a little tricky to implement. Especially since in cases where you have a caesura inside a long compound, e.g., dalitāsura<caesura/>cakravālām, you would want it simply to display a hyphen and line break as usual.

The easiest way to implement this is probably to have something like cakravālā<caesura enjamb="yes">m</caesura> ālokya. Then it would be trivial to display it differently in IAST and other scripts. TEI doesn't support this though, but we could just make it up or something.

I actually think I got used to reading the hyphen and then the final consonant on the next line in IAST; in some ways it makes more sense because it lays bare the metrical structure of the verse. It's also interesting to see if/when scribes write cakravālāṃ ālokya to consciously separate the pādas, in which case I guess you would transcribe it as cakravālāṃ <caesura/> ālokya. Here's an interesting example from McComas Taylor's The Joy of Sanskrit where the defective sandhi around the pāda break makes the verse metrical: https://press-files.anu.edu.au/downloads/press/p276561/html/section7.html?referer=&page=7#

chchch commented 3 years ago

Just to pile on about pāda breaks — here's a beautiful example of a pāda break I looked at recently: https://tst.hypotheses.org/1833 The writer has actually filled the pāda break with unpronounced punctuation characters, although syntactically, I guess the caesura would go just before those characters.

ppasedach commented 3 years ago

Yes, having an attribute/value-pair which one manually inserts only in the places where one wants this feature is probably the best. Can you think of a solution where, instead of using <caesura/> one could stick to my current setup of encoding verses in longer metre as four <l>s? Otherwise this would amount to a huge change of my files. If necessary it can be done of course.

wujastyk commented 3 years ago

Just a footnote to your, "TEI doesn't support this though, but we could just make it up or something." I've heard Lou Burnard, granddaddy of the TEI, at many talks and conferences berate the audience on this issue. The TEI Guidelines, in his view, are merely one particular set of suggestions. Well-considered, etc., but never meant to be prescriptive or circumscribed. He has always recommended that people tailor particular TEI DTDs to their own needs, formally defining entities and attributes as required. Best, Dominik

chchch commented 3 years ago

Peter, what's your current setup? Is it with IAST-style <l> lines (i.e., your first example)?

Re: TEI, I guess I wouldn't want to diverge too much from the published TEI guidelines, for compatibility with other projects. Actually, I think currently the biggest issue with different uses of TEI is that people seem to have different "ontologies"; e.g., people don't agree on exactly what a "colophon" is, so even if we're using the same tag, the contents aren't exactly comparable.

ppasedach commented 3 years ago

I have both, but the Devanāgarī-style (example 2) is clearly in the majority.

wujastyk commented 3 years ago

On Tue, 13 Jul 2021 at 12:26, chchch @.***> wrote:[...]

Re: TEI, I guess I wouldn't want to diverge too much from the published TEI guidelines, for compatibility with other projects. Actually, I think currently the biggest issue with different uses of TEI is that people seem to have different "ontologies"; e.g., people don't agree on exactly what a "colophon" is, so even if we're using the same tag, the contents aren't exactly comparable.

Yes, I agree. But validation is central to the TEI idea (along with good documentation), and that works fine as long as the DTD/Schema is written properly.

The second point about how "colophon" etc. are understood: I do think there's room for improving TEI documentation, especially as it applies to Indian documents. There is a process for submitting suggestions for the next edition, though I don't exactly know what it is. Maybe the SARIT and DHARMA guides could form the basis for some input to the main edition of the guidelines.

Best, Dominik

kellner commented 3 years ago

There is a Special Interest Group on Indic Texts within the TEI (McAllister, Ollett) that I believe would be the best forum to raise such issues. They have a Wiki: https://wiki.tei-c.org/index.php/SIG:IndicTexts.

The IKGA in Vienna is striving to devise guidelines for critical editions of Sanskrit texts based on the critical mass of projects we have here, the tentative title is VEGEST (Vienna Encoding Guidelines for Editions of Sanskrit Texts). This is to build up on the SARIT guidelines that McAllister, Olalde and Ollett produced several years ago.

wujastyk commented 3 years ago

VEGEST sounds an exciting development. I look forward to the results.

One of the issues that VEGEST will need to consider - and probably already does - is that encodings don't exist in a vacuum. There is no free-floating, ideal encoding. Different encoding methods have their own strengths and weaknesses. For example parallel segmentation is great until you have overlapping lemmata. End-point variant encoding solves that problem, but leads to less portable files. Further, encodings should be minimally complex; only encode for what you actually need (hence the TEI P4 "pizza https://www.tei-c.org/Vault/P4/pizza.html" system). But more importantly, encodings need to be seen as part of the larger process of collation and rendering. The encoding of a file for Saktumiva will be different from one for, say, EVT http://evt.labcd.unipi.it/, or ekdosis, because the target systems and their goals are different. And this brings one rapidly to deep questions about who an edition is for and what it should offer. I imagine VEGEST will address all this, and I very much look forward to it.

Best, Dominik