Open arlogriffiths opened 1 month ago
Dear Arlo,
As far as I know, the numerals in Khmer corpus are not written with decimal system, except dates. Salomé and Chloé may confirm this. Thank you for finalising the encoding of numerals, especially the number I. I will check the EG again before encoding next inscriptions with numerals.
Best, Kunthea
Yes, the above notes conform to our encoding guidelines.
in that case, @michaelnmmeyer, please wrap in <g type="numeral">
all contents of <num>
other than strings of 3 or 4 arabic numeral (as such string are liable to be dates in the first or second millennium of the Śaka era and, as Kunthea comments, Śaka dates are normally expressed with decimal digits).
This is addressed in e71eaed30c889b390d14eb796ffe56c442b72fc3. There remains a number of occurrences to check and correct manually, to wit:
<num atLeast="11" atMost="19">10<gap reason="lost" quantity="1" unit="character"/></num>
<num atLeast="2" atMost="3"><choice><unclear>2</unclear><unclear>3</unclear></choice></num>
<num atLeast="880" atMost="888">88<gap reason="lost" quantity="1" unit="character"/></num>
<num atLeast="900" atMost="909">90<gap reason="lost" quantity="1" unit="character"/></num>
<num><choice><unclear>2</unclear><unclear>3</unclear></choice></num>
<num value="10"><g type="numeral">10</g></num>
<num value="14">10 1 <unclear>III</unclear></num>
<num value="17"><g type="numeral">10</g> <unclear><g type="numeral">7</g></unclear></num>
<num value="1"><unclear>I</unclear></num>
<num value="2"><unclear>II</unclear></num>
<num value="4"><unclear>4</unclear></num>
<num value="546"><supplied reason="lost">54</supplied>6</num>
<num value="60"><g type="numeral">60</g></num>
<num value="665" cert="low">66<supplied reason="lost" cert="low">5</supplied></num>
<num value="801">8<unclear>0</unclear>1</num>
<num value="860"><choice><sic>9</sic><corr>8</corr></choice>60</num>
<num value="902">90<choice><sic><num value="2">2</num></sic><corr><num value="3">3</num></corr></choice></num>
<num value="9"><supplied reason="lost">9</supplied></num>
Thanks. I have converted the above into a task list and will take car of it.
@chhomkunthea : I don't understand the cases
<num value="14">10 1 <unclear>III</unclear></num>
: should the 1 be I and should we have <num value="14"><g type="numeral">10</g> I<unclear>III</unclear></num>
?<num value="17"><g type="numeral">10</g> <unclear><g type="numeral">7</g></unclear></num>
: should it be <num value="17"><g type="numeral">10</g> 7</unclear></num>
or <num value="17"><g type="numeral">10</g> <unclear><g type="numeral">IIIIIII</g></unclear></num>
?@michaelnmmeyer:
<num value="1"><unclear>I</unclear></num>
should be changed to <num value="1"><g type="numeral"><unclear>I</unclear></g></num>
.Dear Arlo,
In the case of K. 915, I would like to propose below:
<num value="14"><g type="numeral">10</g> <g type="numeral">I</g><unclear><g type="numeral">III</g></unclear></num>
And for K. 1017, it should be:
<num value="17"><g type="numeral">10</g> <unclear>7</unclear></num>
@chhomkunthea : thanks. I have implemented your suggestion in K. 915 (or rather cleaned up the file which had some conflicts after you had implemented your suggestions).
@danbalogh : do you approve of Kunthea' solution to avoid the problem that <unclear>
cannot be used inside <g>
?
I think I would prefer <num value="14"><g type="numeral">10</g> <unclear><g type="numeral">IIII</g></unclear></num>
because if it were clear, the encoding of the latter part would be <unclear>
from <num value="14"><g type="numeral">10</g> <g type="numeral">I</g><unclear><g type="numeral">III</g></unclear></num>
, we'd be left with <num value="14"><g type="numeral">10</g> <g type="numeral">I</g><g type="numeral">III</g></num>
, which I believe does not really make sense.
That said, I do understand that Kunthea's rationale in choosing the above encoding was to show that the first bar is clear and the other three are not, and I don't have a strong objection to that. So if you are happy with that solution, I think it can stay. I don't suppose we want, at this stage, to revise the encoding of these numeral bars to say that only one I can ever be wrapped in g, and
@michaelnmmeyer — in tfc-khmer-epigraphy, there is a massive number of
<num>
elements whose contents are made up of symbols other than decimal digits that have not been wrapped in<g type="numeral">
by the responsible encoder(s) as EGD 4.2.2 prescribes. Examples:<num value="1">I</num>
should be<num value="1"><g type="numeral">I</g></num>
<num value="4">IIII</num>
should be<num value="4"><g type="numeral">IIII</g></num>
<num value="123">100 20III</num>
should be<num value="123"><g type="numeral">100</g><g type="numeral">20</g><g type="numeral">III</g></num>
There are also cases like
<num value="80">80</num>
which look like they contain decimal digits but where the transliteration is probably a representation of a non-decimal notation system, and so ought to be<num value="80"><g type-numeral>80</g></num>
(as in the 123 example above). But there is no way for a machine to tell that these are not decimal units.@chhomkunthea : do we ever have numbers noted with the decimal system outside of dates in the Khmer corpus? If we do not, then all such cases can automatically be converted to the encoding with
<g>
. You seem to have ignored EGD 4.2.2 so far. Please re-read it carefully.Can you process the xml files and apply
<g>
wherever an algorithm can determine that the contents of<num>
is not (explusively) a series of decimal digits?@danbalogh : please correct me if I have made any mistake in my representation of our encoding rules.
@chloechollet and @salomepichon: please take note of the above if you weren't aware of the rules yet.