Open ajaniak opened 4 years ago
Dear Axelle,
I think we need to keep the encoding. But let’s hear from Chloe and Arlo too.
Have a nice evening, Kunthea
Le 13 févr. 2020 à 17:12, ajaniak notifications@github.com a écrit :
Dear @chloechollet https://github.com/chloechollet and @chhomkunthea https://github.com/chhomkunthea,
In some of the XML files, you have used the tag
to encode numbers written in letters. The EG states that only numbers written with arabic numbers or symbols should be encoded. If you need to keep such an encoding, let me know, I will figure something out for you.
Best, Axelle
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/erc-dharma/tfc-khmer-epigraphy/issues/6?email_source=notifications&email_token=AM4GVNY5EQ2AIJLSZKIEH53RCUMH7A5CNFSM4KUO7PV2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4INGXKNA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM4GVN4CQ3UYPVIARPTWAC3RCUMH7ANCNFSM4KUO7PVQ.
Dear Axelle, I also think that we have to keep the encoding, as we decided to make the difference between numbers written with figures (that will appear as arabic numbers in our editions) and those written with "sticks" like "III" for the 3 number. Shall we modify something to make it more clear ?
if that is what we talking about (numbers like I, II, III encoded with
when Axelle told me about ’numbers written in letters’ marked up with
Axelle: please give some concrete examaple of the phenomenon of ‘
Le 15 févr. 2020 à 06:08, chloechollet notifications@github.com<mailto:notifications@github.com> a écrit :
Dear Axelle, I also think that we have to keep the encoding, as we decided to make the difference between numbers written with figures (that will appear as arabic numbers in our editions) and those written with "sticks" like "III" for the 3 number. Shall we modify something to make it more clear ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/tfc-khmer-epigraphy/issues/6?email_source=notifications&email_token=AAGMAE6QGN65DUSLCZDX4Y3RC52ENA5CNFSM4KUO7PV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL3B4DY#issuecomment-586554895, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGMAE76XL4HNSTH4DJ4U53RC52ENANCNFSM4KUO7PVQ.
I am not talking about the symbols as III
. However, some content of the <num>
are neither numbers neither symbols, if you want to keep encoding it, you have to tell Daniel to change his encoding guide.
so please point us to a few such cases, Axelle, in order that we all know what we are talking about
@ajaniak : shall we try to do what is necessary to close this issue? Please give us a few examples of the cases you have seen.
Dear all,
<num value="8">praṁpiya</num>
and <num value="4">pvān</num>
(6 cases identified)<num value="3">piy∙</num>
. (3 cases) <num value="1">moy·</num>
(1 case) <num value="557">slik· I 100 40 10 7</num>
combining both)<num value="616">ṣodaśottaraṣaṭśata</num>
(1 case)<num value="20">bhaiḥ</num>
(1 case)Thanks a lot Axelle. You are right that all these examples ignore the rules explicitly formulated in EG §7.1/Numbers expressed in words. However, @danbalogh , I think these examples force us to reconsider and refine our rules.
All of these examples, except the one from K. 1240, come from quantified lists of items owned by or given to a certain person or institution. The case from K. 1238 shows that the calculation of @value needs to take words into account (slik means 400).
Do you, @danbalogh, see any way we could allow cases in quantified lists, especially composite numbers expressed partly in words and partly in number signs? We could still maintain prohibition of applying <num>
to chronograms.
I have no objection to people putting <num>
around words and I'm happy to permit it in the EG if you want to do it. My aversion to doing so is based on the following considerations:
<l>
interrupting a numeral expressionSome of the above concerns are in fact there in the EG text, If you say we need this option, and can accept that we cannot plan for every case that may occur and will probably encounter situations where encoding will not be possible or will have to be done in an arbitrary and ad hoc fashion - then I'll revise the EG text and say this can be done optionally.
Here's a good example of a more complex one, from CalE05-Aihole-Pulakesin2 in Badami Calukya epigraphy:
<lg n="34" met="anuṣṭubh">
<l n="a">pañcāśatsu kalau kāle <space/></l>
<l n="b">ṣaṭsu pañca-śatāsu ca</l>
<l n="c">samāsu samatītāsu <space/></l>
<l n="d">śakānām api bhū-bhujāM<g type="symbol" subtype="dash"/></l>
</lg>
@danbalogh and @ajaniak : has the situation evolved at all since last year? It still seems to me it makes sense to allow use of <num>
for such cases as <num value="557">slik· I 100 40 10 7</num>
, without making it mandatory on any numeral expression that wholly or partly consist of words.
My stance is the same as it was: I don't think it is a good idea, but if you want it, I don't mind putting it in the EG. But we will not be able to come up with objective rules for handling all sorts of complex cases (see e.g. my verse example above), and if this sort of thing will be optional and to be handled on an ad hoc basis, then I don't really see what advantage it might serve (e.g. research, display?). I have written up a possible alternative to EGD §7.1.4 - Arlo, please have a look there and see if you like it. Also please reply to my comment there. We could limit it to allow this encoding only for combinations of numerals on words (which may be what you have in mind now), and thereby reduce the twilight zone, but that seems like a very arbitrary restriction to me.
Thanks Dan.
I will now look at your stub in in EGD 7.1.4.
Yes, I did mean combinations of numeral signs and numeral words.
For that kind of research question, using <measure>
(EGD §7.4.4) is much better suited, as it can work independently of num, specify "ghee" and use "kg".
At any rate, I do agree that once we accept <num>
for number-words, then we should allow it in general, and not only when combined with numeral signs. We could even go the whole hog and use <num>
within <num>
for bhūtasaṁkhyā, to encode the value of each word separately in addition to encoding the value of the whole, e.g.
śākeṣv abdeṣu yāteṣv atha <num value="814"><num value="14">manu</num>-<num value="8">vasu</num></num>-saṁprāpta-saṁkhyeṣu meṣe
My only problem remains that encoding multiple numeral words together is complicated, and it will run into problems like that in my example above, which I see no way of solving apart from using linking mechanism involving @xml:id
, which takes the encoding to a whole new level of complexity.
@michaelnmmeyer @danbalogh : it would probably be good to bring this old discussion to a close. Maybe one way to start would be to ask Michael to generate a list of all cases where we have non-number (and non roman numeral) contents of <num>
.
@arlogriffiths , it is not clear what you are asking us to do or why you think such a list generated by Michaël may help. I've read through the issue from the beginning, and the way I see it is that this discussion was brought to a close back then, only nobody pressed the Close button.
The EGD (§7.1.4) has already been revised (on 17 May 2021 according to my comment there) to permit text within <num>
. I don't recall whether you had offered comments on my stub back then (can check if it's really important to you, but would rather not try to find this among the thousands of completed comments), but at any rate, the revision was finalised almost two years ago, and I probably would not have done that unless you had in some way affirmed that this was what you wanted.
My concerns remain what I stated above. I can live with these, but you need to be aware that you and everyone else will need to live with them too, and living with them includes not calling on me to devise new ad-hoc solutions every time a complication turns up. The concerns are:
<num>
element will include words that have nothing to do with numbers. There is nothing we can do about that, unless we want to start using multiple num tags and linking them with xml:id. I want to avoid that, so we'll have to live with either having non-numeral meanings within the num tag, or not tagging the numeral expression unless is spatially contiguous.So our options at the moment are:
A. leave things as they are now, accept that there will be fuzzy cases, and close this thread; or
B. give up encoding <num>
on words and revert to what we had in the EGD before May 2021 (perhaps suggesting that measure could be used for quantities and commodities); or
C. reopen the discussion and spend dozens to hundreds of hours working out something that does better justice to the contents, but will come at the cost of immensely complicating our markup.
The above order of options is my order of preference.
Thanks Dan. I am sorry that some such discussions go through such a weird process before being concluded. The fact that we all have a lot of work on our plate has something to do with it. I don't remember what I may have commented on EGD 7.1.4 at an earlier stage and there's certainly no need to dive into the version history of the gdoc to find out.
Having re-read the discussion above, as well as the intro to EGD 7.1.4, I am struck by the absence of a clear definition of the purpose of our use of <num>
. I haven't checked what TEI and EpiDoc say about it. I would, before being reconfronted with this discussion, instinctively have responded that we use <num>
in connection with the history of scripts, i.e. making it possible to assemble data for the study of the number systems that were in use (decimal place value or not) and on the graphic shapes used to express positions in the respective systems. If that answer is at least partly correct, then our EGD rule "when a glyph that would normally be a numeral sign is used in a function other than to represent a number (such as the glyph normally meaning 1, occasionally used as an auspicious opening mark), then the <num>
tag must not be added to it (§4.2.7)" might not make perfect sense. Again, if the above answer is at least partly correct, then my preference expressed a few times in earlier iterations of this discussion (though never as a hard imperative) to allow people putting <num>
around words might not have been well considered.
I'd like to know how you view the rationale for our use of <num
. Depending on your answer, I might prefer A or B among the options you give above.
My request for a list from @michaelnmmeyer was intended to allow us to determine how many instances of this use of <num>
we actually have in the inscriptions encoded so far. Surely, if we have just a few handfuls, we will more easily opt for B than in case we have thousands.
We have:
8979 I
497 II
231 III
93 IIII
31 IIIII
21
14 sa
13 IIIIII
10 tluṁ
9 X
7 vyara
7 mvāya
7 dvaya
5 IIIIIII
4 ruA
4 ½
3 rla
3 pataṁ
3 panneraḍu
3 pañca
3 mūṟu
3 daśa
2 XII
2 vyar·
2 ṣoḍaś
2 sārddha
2 ruAṁ puluḥ
2 praṁpiya
2 pataṁ puluḥ
2 mvāy·
2 mūvattu
2 mūru
2 IIIIIIIII
2 gra
2 eṁṭu
2 dvayā
2 daśamĭ
2 catuḥ-sahasra
1 XI
1 vvalu puluḥ
1 vuAluṁ puluḥ
1 tri
1 trayo
1 tai rat· III
1 ṣodaśottaraṣaṭśata
1 sī rat· III
1 sāyiradanūraṁ
1 sāyira
1 ṣaṣṭi
1 sā rutuḥ limā pluḥ sā
1 sārddha-nava
1 sapta-pañcāśad-anvita-catuś-śata
1 rvaṁ
1 raḍu
1 radiḻnūṟu
1 pvāna
1 pvān
1 pratipāda
1 praṁvyal·
1 prāṁm·
1 ppannircchāsiram
1 piy·
1 panneraḍumann
1 panne
1 pañcadaśi
1 pādona-ṣaṭṣśata
1 pādā
1 ondu
1 nūrayvatt’
1 nava
1 mvaya
1 mūvatteraḍum
1 mūvattaṁ
1 mūnūṟu
1 mūnūṟayvattu
1 kulya
1 katlu
1 katiga
1 Isī
1 irppatan
1 irpattu nālku
1 IIIIIIII
1 I ½
1 eṇchāsiraṁ
1 eṇchāsiram
1 eṁṭunūṟu
1 eṁṭunūru
1 Eḻunūṟayvattu
1 eḻpattumaṁ
1 Ekădaśi
1 dvāviṁśa
1 droṇa
1 daśami
1 caturtha
1 bpataṁ
1 bhai mvāya
1 bhai mvāy·
1 bhaiḥ
1 ayvadiṁbaruṁ
1 aynūṟuvaṁ
1 aynūṟu
1 aynūru
1 Aṣṭami
1 asṭami
1 aṟuvattu
1 āṟu
1 a hundred
1 āḍhavāpa
In my view, our encoding of <num>
is primarily semantic, not palaeographic. This is also why it is in section 7 "Additional information" in the EGD, and not in section 4 "The originally inscribed text". The TEI definition is: "contains a number, written in any form". In that respect, permitting it around words does make sense, and is TEI-approved. For palaeographic studies, <g type="numeral">
is more appropriate, but we don't use that around digits 1-10. However, I think doing a simple search for a particular number (or a wildcard search for any number) should be feasible for palaeographic studies of numeral signs, since I assume that the search will (or can be made to) ignore editorial numerals such as line and stanza numbers.
All in all, to my mind, the principal usefulness of <num>
is in additive numbers where e.g. 100 + 20 can be tagged with the number 120. But to be honest, I haven't pondered the purpose, and I've always just accepted this encoding as a given, since it's part of the EpiDoc guidelines (which I've just checked, and which also say nothing about why it's good to do this). The potential to use it for studying whether numbers are written additively or in decimal place value is I think present whether we allow the tag around text or not; it would have to be used in combination (e.g. "do we have numeral characters [0-9] in a <num>
tag without also being in a <g>
tag?"; "do we have a <g>
tag inside a <num>
tag?).
Given this and the large number of cases in Michaël's list, I continue to think we should keep things as they are now.
Thanks Dan. Reading your response, I think there might be potential use in considering doing without <num>
altogether. This would be a signficant lightening of the burden of encoders. Before taking any such radical step, it would be necessary to consult MARKUP on what benefit the community sees in this element's use.
Comments on the above list:
<num>
is supposed to be applicable to ordinals — does our guide say anything about it?<num>
wherever it may occurThe TEI guidelines explicitly say that ordinals can be encoded with <num>
. Our guide says nothing about this. Thinking about it, they certainly should be included: "in the year fifty-eight" and "in the fifty-eighth year" mean the same after all, so it would be really bizarre to tag only the former. I agree with the rest of your observations.
As for my practice with <num>
I use it in the following cases:
<num value="10"><g type="numeral">10</g></num>
= "10"<num value="11"><g type="numeral">10</g> 1</num>
= "11"<num value="10"><g type="numeral">10</g></num>
-Āvatu = "10th"<num value="11"><g type="numeral">10</g> 1</num>
-Āvatu = "11th"There is never text/word inside my <num>
As for the options suggested by Daniel above, excluding C, as I fully agree that we do not need to
spend dozens to hundreds of hours working out something that does better justice to the contents, but will come at the cost of immensely complicating our markup
I would opt for A. leave things as they are now, accept that there will be fuzzy cases, and close this thread
I have however no objection to B (give up encoding <num>
on words and revert to what we had in the EGD before May 2021, perhaps suggesting that measure could be used for quantities and commodities), as I do not foresee that I will encode words with <num>
, provided it is not too much extra work.
Dear @chloechollet and @chhomkunthea,
In some of the XML files, you have used the tag
<num>
to encode numbers written in letters. The EG states that only numbers written with arabic numbers or symbols should be encoded.If you need to keep such an encoding, let me know, I will figure something out for you.
Best, Axelle