erc-dharma / tfc-nusantara-epigraphy

DHARMA project task force C, Nusantara epigraphic corpus
https://dharma.hypotheses.org/
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

revise <abbr>mā></mā> etc. #68

Open arlogriffiths opened 2 months ago

arlogriffiths commented 2 months ago

The following passage in https://dharmalekha.info/texts/INSIDENKHering rather clearly shows that simply encoding <abbr>mā></mā> may not be sufficient:

(A22) rgaṇuṁ pu daṅhil· Irikaṁ lmaḥniṁ samaṅkana mā kā 1 su 13 mā 14 kunaṁ pamahli pu <lb break="no" n="A22"/>rgaṇuṁ pu daṅhil· Irikaṁ lmaḥniṁ samaṅkana <abbr>mā</abbr> <abbr>kā</abbr> <num value="1">1</num> <abbr>su</abbr> <num value="13">13</num> <abbr>mā</abbr> <num value="14">14</num><supplied reason="subaudible">,</supplied> kunaṁ pamahli pu

Because in the first case mā means mās and in the second case it means māṣa. It also shows that we should not consider the numerous collocations mā su as mere spelling equivalents for mās su I (which I think in some inscriptions encoded by Eko and me we have implied by using <choice> or by using <supplied> for the s).

I am presently inclined to think we should enrich our encoding of <abbr>, in due course, as per EGC 7.3.1. I also suppose the machine will be able to help us because

  1. <expan><abbr>mā</abbr><ex>ṣa</ex></expan> will normally be followed by <num>, whereas
  2. <expan><abbr>mā</abbr><ex>s</ex></expan> will normally be followed by another <abbr> (<abbr>kā</abbr> or, more commonly, <abbr>su</abbr>) and the latter cases of <abbr> can be changed to <expan><abbr>kā</abbr><ex>ṭi</ex></expan> etc. by a massive search and replace operation.

I vaguely recall that the new version of EGD that Daniel is preparing has some new rules for use of <abbr> so we should in any case wait until that new version has been released. In due course, I hope @danbalogh and @michaelnmmeyer can advise us and help carry out any multi-file modification.

wayanjarrah commented 2 months ago

Thanks, acknowledged. I agree with the need for a more detailed encoding of these two cases of <abbr>mā</abbr> and will await advice from Daniel and Michaël on the best way forward.

danbalogh commented 2 months ago

Hello, the encoding rules for abbreviations have not changed since the last revision in 2022, which is reflected in the Google doc, though of course not in the PDF of the EGD v1. The format used by Arlo above, with expan containing abbr and ex, is the way to go. I'm not planning further revision for the v2 release. The only difference I see between the working document and the GDoc was that māsu is specifically mentioned in the introductory section for abbreviations. I have now copied that to the GDoc (see 7.3, before the subsection 7.3.1), and let me know if you wish to change that.

arlogriffiths commented 2 months ago

Thanks Dan. Re. the bit in 7.3, I am bothered by treating bahula-divase as a "more than one word", since Sanskrit compounds are by definition single words.

I'd forgotten about the solution proposed for māsu and the gist of my message above was that we'd treat it as follows:

<expan><abbr>mā</abbr><ex>s</ex></expan> <expan><abbr>su</abbr><ex>varṇa</ex></expan> or, if we only bother about expanding ambiguous cases, <expan><abbr>mā</abbr><ex>s</ex></expan> <abbr>su></abbr>

I can live with any of these solutions.

danbalogh commented 2 months ago

I've added a note to revise "more than one word". Please confirm that you want the bit on tagging māsu as a single abbreviation deleted; my impression from what you say is that you will never want to tag <abbr>māsu</abbr> and instead always tag mā and su as two separate abbreviations, regardless of which (if any) you then resolve. The choice to resolve or not is up to you guys who know the texts, and not something that should or could be dictated by the guide.