erc-dharma / tfc-khmer-epigraphy

This repository assembles data produced by the project Corpus des inscriptions khmères (before and during the DHARMA project).
https://dharma.hypotheses.org/
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

encoding script class and maturity #38

Closed arlogriffiths closed 5 months ago

arlogriffiths commented 6 months ago

Dear @chhomkunthea, @chloechollet and @salomepichon !

I have only just become aware that the specification in the revised EGD under "7.5.5. Script" does not yet seem to have been implemented in our xml files yet.

I suppose in almost all cases a single script class and maturity can be defined for an entire inscription and hence be encoded in <div type="edition">. So I can ask @michaelnmmeyer to make a global change to all our files. (I still need to tell him which values to use.) But I will need from you a list of any biscript inscription or inscription that use a foreign script (Siddhamatrika for instance).

Thanks.

Arlo

chhomkunthea commented 6 months ago

Dear Arlo,

In the case of Cambodian inscriptions, the inscriptions encoded so far are single-script, written in the Kambujākṣara. I hope that Chloé and Salomé can doublecheck and confirm.

Best, Kunthea

arlogriffiths commented 6 months ago

Thanks. Please do.

salomepichon commented 6 months ago

After double checking, I confirm what Kunthea commented.

Best, Salomé

arlogriffiths commented 6 months ago

Thanks @chhomkunthea and @salomepichon. See the gdoc of EGD under "7.5.5. Script" and the following link for where we get the number codes: https://opentheso.huma-num.fr/opentheso/?idt=th347

@michaelnmmeyer: please insert @rendition="class:83231 maturity:00000" into the <div type="edition"> of all the xml files in tfc-khmer-epigraphy.

I think for maturity, our encoders will have to decicde case-by-case whether 83213 ("Usually 7th to 10th century, many characters would be difficult or impossible to recognise for a person familiar only with the script of a distant region") or 83215 ("Vernacular Brāhmī-derived script: usually 10th century to modernity, the script typically used for writing a vernacular language (though it may also be used in the relevant area to write a superregional language).") is more applicable to be filled in instead of 00000.

There is no objective way to distinguish between the two values so I suggest using the former for items datable up to 1000 CE and the latter for items datable after 1000.

michaelnmmeyer commented 6 months ago

Done in 699d031ef8e2c3e59eba8b3a0828f31ccfe2d5b1.

Note however that this reformatted all tags start nodes that don't fit on a single line, so

<lb 
n="5" break="no"/>

becomes

<lb n="5" break="no"/>

There is no way to preserve the initial formatting for now.

arlogriffiths commented 6 months ago

Thanks a lot @michaelnmmeyer. The reformatting of those tags whose start nodes don't fit on a single line is no problem.

@chhomkunthea, @salomepichon and @chloechollet:

please now proceed gradually to replace the part "maturity:00000" by one of the following options (along with the definitions furnished on OpenTheso):

  1. "maturity:83211" — Usually 4th to 6th century CE, pronounced regional differences, distinct character bodies, headmarks are ubiquitous, forming an integral part of characters, conjuncts and final consonant forms are common.

  2. "maturity:83213" — Usually 7th to 10th century, many characters would be difficult or impossible to recognise for a person familiar only with the script of a distant region.

  3. "maturity:83215" — Vernacular Brāhmī-derived script: usually 10th century to modernity, the script typically used for writing a vernacular language (though it may also be used in the relevant area to write a superregional language).

The dating brackets proposed om OpenTheso don't seem to work very well for Cambodia and Campā, since the modern vernacular Cham and Khmer scripts are quite distinct in graphic shape and in various structural aspects from even the latest epigraphically attested script forms we are working within in DHARMA, I would suggest that for these two corpora, the dating brackets be revised as follows:

  1. "maturity:83211" — for the earliest inscriptions of Cambodia and Campā, between 400 and 700 CE
  2. "maturity:83213" — for the whole of the rest of the two corpora up to 1500
  3. "maturity:83215" — for script forms found in manuscripts and in modern usage, and hence this value would never be relevant in our inscriptions

I'd like to have @danbalogh's opinion on this proposal if he feels he is able to give one.

danbalogh commented 6 months ago

On auto-adding script class and maturity, I'd like to note that if any of your editions involve textpart divs with different scripts, then script data should be encoded only on those divs and not on the edition div. Not sure this is relevant, but in case it is, see the point starting with "when an inscription is encoded as two or more textpart divisions" in EGD §7.5.5.

On the choice of maturity, I have absolutely no problem with setting different age brackets than those suggested in the definition. In my own narrow slice of Vengi Calukya copper plates, I'm fortunate to have a very conspicuous shift at some point in the 11th century (I haven't encoded enough of these late inscriptions to know if this is actually time-dependent; I suspect not, i.e. that the earlier script stays in use parallel to the newer one for some time yet). So some inscriptions from that century are in a distinctively Telugu-like script, and these alone are the ones for which I used the Vernacular maturity; all the rest of my inscriptions are encoded as the second level. I'm totally unfamiliar with the SE Asian scripts; all I know is that when Arlo gives me a clipping to use as an example in one of the guides, I struggle to identify most characters even when he has provided the transliteration. What I'm trying to say here is that to my eye, the characters Arlo proposes here to encode as maturity 2 are much further removed from "basic standard" Brahmi than the characters I'm encoding as maturity 2. However, this does not mean that some sort of absolute metric for "removedness from standard Brahmi" must be the criterion for judging maturity levels. Even if such a metric could be computed somehow, I think your intuition based on knowledge of the corpus and its palaeography is the best guide. If it is your feeling that the script of the manuscripts is much the same as that in modern use and both are quite different from the typical epigraphic script used between 700-1500, then I'm perfectly happy to accept that.

salomepichon commented 5 months ago

Dear all, As a first step, I have changed the script class and maturity of all the encoded cam inscriptions of the Indrapura dynasty, and of all the encoded inscriptions in my khmer corpus.

chhomkunthea commented 5 months ago

Dear all,

As for me, I'm adding the maturity category to my encoded files. I will take me some time to finish the task.

Best, Kunthea

chloechollet commented 5 months ago

Dear all,

I've already started too a few weeks ago, in the course of taking over the encoding of the inscriptions!

Best,

Chloé

Le 8 mai 2024 à 15:52, chhomkunthea @.***> a écrit :

Dear all,

As for me, I'm adding the maturity category to my encoded files. I will take me some time to finish the task.

Best, Kunthea

— Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/tfc-khmer-epigraphy/issues/38#issuecomment-2100630325, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGGONIYDERNBN6OFKTUEEPTZBIUZ3AVCNFSM6AAAAABFMALZJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGYZTAMZSGU. You are receiving this because you were mentioned.Message ID: @.***>

arlogriffiths commented 5 months ago

thanks for your reactions, @chhomkunthea, @chloechollet and @salomepichon. I trust you will accomplish the work in due course and am closing this issue now. if you need it, you can always find it back among closed issues on tfc-khmer-epigraphy.

chhomkunthea commented 5 months ago

Dear Arlo,

I am working on this script maturity category issue along with other issues such as “Ddanda”. I take this opportunity to double-check the dates of the inscriptions mentioned in our articles “the Mekong Delta inscriptions” (especially those in the tables) and “Epigraphy of Cambodia and Campā”. So, I may need quite some time to finish all the encoded inscriptions.

Best, Kunthea

On 9 May 2024, at 12:24, arlogriffiths @.***> wrote:

thanks for your reactions, @chhomkunthea https://github.com/chhomkunthea, @chloechollet https://github.com/chloechollet and @salomepichon https://github.com/salomepichon. I trust you will accomplish the work in due course and am closing this issue now. if you need it, you can always find it back among closed issues on tfc-khmer-epigraphy.

— Reply to this email directly, view it on GitHub https://github.com/erc-dharma/tfc-khmer-epigraphy/issues/38#issuecomment-2101961173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM4GVNZ3TUNMJK5QPHJJ2O3ZBMCANAVCNFSM6AAAAABFMALZJ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRHE3DCMJXGM. You are receiving this because you were mentioned.