Open arlogriffiths opened 6 months ago
We have certainly not foreseen the quotation of stanzas, so there's no solution for this in the EGD and thus, I suppose, no proper way of doing it within the schema's constraints. It's a typesetting detail, which we should strive to keep to a minimum in our files. That said, I can see how doing this sort of thing would be desirable in some cases, so I'm tagging @michaelnmmeyer on this and asking for his suggestions. I think TEI permits using <l>
without being wrapped in an <lg>
, but this seems uncommon and probably not good practice. So instead of using l elements in your <q rend="block">
, my suggestion is to permit using <lg>
without any attributes in the commentary and translation div. (Actually, <lg>
is already permitted in these cases; it's just that the mandatoriness of n and met need to be lifted when in one of those divs.) There is also no problem with using an <lg>
inside a <p>
, so my recommendation is to use <lg>
instead of (rather than within) <q>
for stanzas. In this case, I would make @xml:lang
mandatory for stanzas in a language other than that of the commentary (i.e. other than English, normally). For the prose translation of the stanza I would then use a <q rend="block">
, while if someone happens to want a verse translation, a second <lg>
(without xml:lang) would work. Thus, your example would look like this:
<p>In further support ... crucial elements that we also see in ours:
<lg xml:lang="san-Latn">
<l>āsādya <hi rend="bold">śaktiṁ</hi>...</l>
<l><hi rend="bold">kumāra</hi>bhāve...</l>
</lg>
<q rend="block">After attaining the Power (or: weapon) of <persName type="god">Maheśvara</persName>...</q>
Mahendra is both the name of King Rājendravarman's father, Mahendravarman...</p>
@michaelnmmeyer , do you approve of this suggestion? If yes, please revise the schema to permit it, and I'll add a stub to the next EGD for this. If not, please suggest something better.
@arlogriffiths , incidentally while copying and editing your code, I noticed that you seem to use a (single) quote mark to open and close your block quote of the translation. I think this should not be done; if we want quote marks around block quotes, then these should be coded into the display to appear automatically. But normal typography practice is not to use quote marks in block quotes.
I would prefer we wrap the <lg>
within <q>
, as in:
<p>
In further support ... crucial elements that we also see in ours:
<q rend="block">
<lg xml:lang="san-Latn">
<l>āsādya <hi rend="bold">śaktiṁ</hi>...</l>
<l><hi rend="bold">kumāra</hi>bhāve...</l>
</lg>
</q>
<q rend="block">
After attaining the Power (or: weapon) of <persName type="god">Maheśvara</persName>...
</q>
Mahendra is both the name of King Rājendravarman's father, Mahendravarman...
</p>
The main rationale is that a verse is not supposed to be a quote. Verses are indented a bit, but not as much as block quotes. I would rather keep the two dimensions separate. It also looks weird to me not to have matching <q>
s (or maybe a single <q>
) for the text and its translation.
I am fine with everything else.
Fine by me, except: in that case why start a second <q>
element for the translation? I.e. why not the following?
<p>
In further support ... crucial elements that we also see in ours:
<q rend="block">
<lg xml:lang="san-Latn">
<l>āsādya <hi rend="bold">śaktiṁ</hi>...</l>
<l><hi rend="bold">kumāra</hi>bhāve...</l>
</lg>
After attaining the Power (or: weapon) of <persName type="god">Maheśvara</persName>...
</q>
Mahendra is both the name of King Rājendravarman's father, Mahendravarman...
</p>
This is better indeed. I used two <q>
out of habit with LaTeX.
Then let's stick to single <q>
and <lg>
containing <l>
within the <q>
.
@arlogriffiths : if you agree, please let us know and edit your file accordingly. Then @michaelnmmeyer can modify the schema to permit this, and I'll jot it down for the EGD.
Thanks both of you. I have updated the file DHARMA_INSCIC00137.xml and await Michael's chnages to the schema for the relevant validation problem to disappear.
I've added a comment to self in the EGD revision draft. @michaelnmmeyer, please close the thread if the schema is done.
Done.
While we are at it, two observations for the EGD:
lg/@n
must be > 0 and < 4000 (since we are converting it to a Roman numeral). I have seen a 0 somewhere.lg/@met
and l/@met
, even though some values are only valid for one of them. If we want to have different lookup tables for them, this should be encoded in the prosody file.Thanks.
The EGD already says that "stanza numbers shall always be Arabic numerals starting from 1", so 0 is an error that should be corrected by the encoder. I'll add an explicit remark in the next EGD that 0 must not be used.
The metre table will indeed be removed from the next EGD now that we have the external prosody file.
As far as I am aware, the only values permitted for l/@met
that are not also permitted for lg/@met
are the names of the anuṣṭubh vipulās (always containing the string vipulā). There are, afaik, no values permitted in lg that are not permitted in l.
If there are other things in met (i.e. not in the prosodic patterns list, nor a sequence of XML prosodic code, nor a string ending with "vipulā"), then they are probably errors, but I'd need to know more about them.
What is your suggestion for dealing with vipulās? (You probably know this, but these are legitimate variant anuṣṭubh lines; while anuṣṭubh is viewed as an ardhasama metre, so the prosodic code determines two quarters, the vipulā variation is in one specific quarter, either the first or the third.) Shall I add them to the prosodic patterns list as 8-syllable templates? This is not quite correct, since it would imply that each quarter of such a verse should follow that pattern. But I think anything else would be too complicated to implement. There are also a couple of other details in the prosody file that need to be sorted out, because some of the non-syllabic or not fully syllabic metres cannot be described with simple formulae. So I think what we need is the option to add free text to any metre entry, where a human-readable note or explanation can be shown.
I have no better idea for vipulās.
For complicated metres that cannot be described by a pattern, we could decide, say, that the first <note>
is displayed instead of the pattern. Currently, tooltips show the prosodic pattern corresponding to a given metre; if there is none, they show the XML pattern (as a code block); if there is none, they show the gaṅa pattern.
Displaying the first <note>
when there is no template sounds good to me. As for the other cases you list, thanks, I was not aware of this. Unless I'm missing something, there should be no cases where an XML pattern or a gaṇa pattern is present, but the traditional prosodic notation is not. The conventions file is of course a bit untidy, but these cases should be rectified in the file rather than catered for in the display algorithm. So I think the tooltip should just contain:
However, this does not solve the case of vipulās, since they do have a (more or less) codeable prosodic template. So we'll just have to live with the fact that they are in the list next to regular verse templates, and it's up to the encoders to follow the EGD and not use a vipulā name for stanzas, only for lines.
I have pushed a revised prosody file with the vipulās recorded after the regular anuṣṭubh, and some new sections for the non-syllabo-quantitative metres, to which I have added note elements with the explanation of the metre where I could.
We have a file with test cases for the display of @met
: https://dharmalekha.info/texts/INSTestProsody. There are a few cases where an XML pattern is given without a prosodic one, but none where a gaṇa pattern is given without other patterns.
I will add something for displaying <note>
.
I've checked through the file, done a bit more tidying (removing XML comments and strings line "no data available" from the XML and/or prosody items), and pushed. It seems to me that the only ones with an XML code but no prosodic code are the moraic metres (āryā and co), which cannot be described fully with prosodic symbols. To be honest, the XML notation with numerals also doesn't do justice to these metres, since in many of the feet the requisite 4 morae cannot be composed any which way. So in the long run it may be best to remove the XML notation too, and just write a note. For the time being, I'm OK with displaying the XML code in these cases.
For the long run, how complicated would it be to introduce an attribute for the <seg type="xml">
and the <seg type="prosody">
that would mean "Approximate pattern", i.e. that the pattern given in the XML notation / prosodic symbols is not a fully accurate description of the metre? If this attribute (e.g. @ana="approximate"
or whatever) is present on a seg that is pulled in for the tooltip, then the tooltip should display "Approximate pattern" or suchlike before the notation. If this is too complicated to be feasible, then let's forget about it; in this case I think the solution for the long run would be to delete both the xml and the prosody codes for the problematic metres, and just use a note that describes the pattern.
Adding some attribute to indicate that the pattern is approximate is OK. That said, it might also be useful to make this part of the notation, by using some special symbols, etc. This would allow people to represent uncertainty in @met
and @real
when needed, with, say, some kind of regular expression. I do not know how "standard" the notation is, nor do I know how deviations would look like, so I cannot give good input on this matter.
This is not about uncertainty, but about specific rules. E.g. I suppose you are familiar with the pattern of the odd lines of pathyā anuṣṭubh: ⏓⏓⏓⏓⏑−−⏓. But actually, this is not all there is to it, since second and third syllables must never be both short. They can be both long, or one can be short, or the other can be short. There is no way to describe this constraint with conventional prosodic symbols that I know of, except by showing the alternatives e.g. in table form, with a merged first cell containing ⏓, three separate second cells one below the other containing respectively ⏑− / −⏑ / −−, and then the rest of the formula with ⏓⏑−−⏓ again in a merged cell. Similarly, āryā and its ilk are based on tetramoraic feet, i.e. in principle a foot can be any combination of shorts and longs whose total moraic length is 4. But in fact, the combination ⏑−⏑ is forbidden in most feet, although it is required (with one even more specific exception) in the sixth foot of some metres, and permitted (even preferred) in some other feet. Again, these are rules that cannot be represented using prosodic symbols except in the form of a table with alternatives. While a way to encode uncertainty regarding @met
and @real
may be desirable, I think developing one would not worth the trouble: too much complication of our rules (with increased chances of human error) for very little gain. The issue at hand is just that our prosody reference file could be made more accurate if it had the analogy of a tick box for cases where an overall pattern can be codified for a metre, but that pattern does not communicate all the constraints of the metre.
How about adding some attribute to the <item>
elements in the prosody patterns file, to signify that it is a metre that is only applicable to lines only? If that attribute is missing, then by default the metre can (and almost always does) apply to stanzas, but can (rarely) also apply to lines. If it is present, then the metre cannot apply to stanzas and may only be used for lines.
What would be a suitable attribute? Apparently, <item>
is not typed, so perhaps @ana="line"
?
We should also make a decision on what attribute should be added to the <seg>
elements to indicate that the formula in that seg is only an approximation of the metre in question, i.e. that there are further nuances that cannot be represented in that kind of code. See above.
Following up on my last comment, I now suggest that we classify patterns further. The attribute distinguishing "stanza" and "line" patterns, whatever attribute is selected, could also be used to distinguish sama, ardhasama and viṣama metres, so that its permitted values would be sama, ardhasama, viṣama and line; the default would be sama.
And perhaps the same attribute could be used with the value "tamil" for Tamil metres. If we discard the metre list from the EGD appendix, then the Tamil metres will also need to move to the prosodic patterns file.
@danbalogh
Do you suggest thus, whatever attribute is chosen, to allow five values: sama, ardhasama, viṣama, tamil and line? Maybe "dravidian" instead of "tamil"? (as, although I have no precise idea, we might encounter Telugu prosody verse, if such a thing does exist).
As for "moving the Tamil metre to the prosodic patterns file", do you mean in order to be displayed here: https://dharmalekha.info/prosody In which case, yes, it would be good. But I am mostly incompetent in this matter.
@manufrancis basically yes to all. For the values, I have not (yet) really thought this through; perhaps in addition to the five you list, we should also add "moraic" and a seventh label for the kind of Sanskrit verse where a set number of morae are followed by a set pattern of syllables. Telugu prosody does exist, and it's quite opaque to me even though there are a couple of (Sanskrit) verses in my Eastern Cālukya inscriptions that probably follow a Telugu pattern. If and when we or our successors get to codifying Telugu metrics along the same line, I think they may be better off introducing a separate "Telugu" label than to lump these together with Tamil under "Dravidian". But if you do know that the Tamil metres are also used in other languages, then I'm also OK with Dravidian. Moving the Tamil metre definitions to the prosodic patterns file (source https://github.com/erc-dharma/project-documentation/blob/86b2e161b5f0e98bc17bf1941b26e6abb391e4fe/DHARMA_prosodicPatterns_v01.xml display https://dharmalekha.info/prosody) can be left to me, once we have worked out how things should be encoded in that file.
As for the encoding details, I and Michaël will need to think and talk about the best way to describe metres that cannot be matched to a set pattern of long/short syllables, and to work out how these types could be encoded. Perhaps instead of an attribute, we should simply create separate lists, with headings, for the various kinds of metres. Will need more thought.
Thanks, Dan. Fine with having only "tamil" and leave codifying other Dravidian metrics for the future. Thanks for planned move of Tamil metre definitions to the prosodic patterns file.
@danbalogh — I am revising an old Campā file where I attempted to quote a full Sanskrit stanza in two lines using block quote:
The use of
<l>
here is not allowed by our schema. Does our schema allow any means to do what I am trying to do, or should I just try to do it differently?