Closed arlogriffiths closed 4 years ago
On the pāñca<choice><orig>varṣa<choice><orig>I</orig><reg>yi</reg></choice>kā</orig><reg>varṣikā</reg></choice>
matter: I've seen your comment on the same in the EG too and I've thought a bit about it since. But I'm not sure, since before Dharma I did not distinguish normalisation from correction. So let me think about it aloud. I would definitely prefer avoiding normalisation within normalisation as I don't think it serves a useful purpose. It may also be difficult to display meaningfully. If we can agree on that, then I think all of the following may still be OK:
What you've done, i.e. encode the basic normalisation and mention the advanced one in a note. I don't have a problem with the redundancy of that note; the point would be much harder to grasp without that redundancy. This would be in line with the policy for correction, that we don't aim at correcting to textbook standards, but to what seems to be the actual standard of the text.
The reverse (or whatever) of what you have done: encode the ultimate normalisation and mention the basic one in the note. Since we distinguish normalisation from correction, we could perhaps say that this is better than 1, since by wrapping something in <reg>
we make a claim that that form has been regularised. But shouldn't that be the fully standard form then?
View the basic normalisation as a correction, and encode that nested within a normalisation. After all, your assumption is that the non-standard morpheme pāñcavarṣayikā was rendered strangely as pāñcavarṣaIkā. Doesn't that strangeness qualify as a scribal error? I think it might.
Having outlined these and thought about them a bit, I think my preference would be for 2, because I am now quite convinced that pāñcavarṣayikā should not be tagged as a regularised form. If you want to be meticulous about encoding, then I would encourage 3, but I don't think that is really necessary, as it creates complicated encoding and a complicated display without any tangible advantages. I think the essence of our aims are to: a) create a searchable diplomatic text; b) create a searchable curated text that includes, among other things, some normalised forms; and c) to explain to human readers any other complexities. Nesting a normalisation within a normalisation would create a third layer of text, partway between diplomatic and curated, and I don't think we want to create tools to work on that third layer, i.e. to display it, search it or data-mine it. (The same could be said of nesting a correction within a normalisation, but here at least the display could be handled, and machine-actionable stuff would still be able to work on either of the two layers, so I don't see that as a problem.)
On <gap reason="illegible">
in a translation: the question is, do we need that complexity? Does it matter with regard to the translation whether a gap in the translated text is because something is altogether lost or just illegible? I think not. But if you think it does, I'm OK with allowing that in the EG.
(See also some recent comments in the EG translation section: I'm actually proposing to simplify some of the code further. You may have responded to some of those in the last two days; I'll check and answer your latest comments next week.)
Thanks a lor @danbalogh !
I wasn't exactly sure how to implement your advice on pāñcavarṣaIkā, but have in the end opted for this:
<app loc="17">
<lem>pāñcavarṣ<choice><sic>aI</sic><corr>i</corr></choice>kā</lem>
<note>The scribed intended to write the word <foreign>pāñcavarṣayikā</foreign>, but <foreign>pāñcavarṣikā</foreign> is the correct form.</note>
</app>
Let me know if you would advise otherwise.
As for <gap reason="illegible">
, I think I am not the only encoder who will find it bothersome to edit with @illegible
and then be obliged to translate the same passage with @lost
. But if the simplifications you propose make it possible to avoid this choice having to be made, it will be all the better.
That is also fine by me, but it doesn't correspond to any of my suggestions. Those would be, as numbered above,
pāñcavarṣ<choice><orig>aI</orig><reg>ayi</reg></choice>kā
(plus the earlier note)pāñcavarṣ<choice><orig>aI</orig><reg>i</reg></choice>kā
(plus a note explaining that aI is probably a non-standard way of writing the non-standard form with ayi)pāñca<choice><orig>varṣ<choice><sic>aI</sic><corr>ayi</corr>kā</orig><reg>varṣikā</reg></choice>
What you opted for tells the reader, in my perception, that you consider aI to be an unintentional error, and the intent of the composer had been pāñcavarṣikā. That's OK, but rather different from the earlier attitude, which set this up as non-standard usage.
Thanks Dan. Your choice 3 would have my preference, but it doesn't validate due to overlapping of the <choice>
elements. Do you see a way to modify it in order to avoid that problem?
@danbalogh : could you respond to my last question?
Ah, that is just a silly mistake of my own. There's no overlap, I simply forgot to close the first choice. The correct encoding is: pāñca<choice><orig>varṣ<choice><sic>aI</sic><corr>ayi</corr></choice>kā</orig><reg>varṣikā</reg></choice>
Thanks! (And apologies for my feeble inability to solve that problem myself!)
I have now finished a first complete draft.
@danbalogh : could you take a look at the file, esp. at the argr comments, and most especially the one or two addressed to you?
@ryosukefurui : please take a look — I hope it gives you inspiration/motivation to resume encoding. Note that I have changed the file structure of our folder. Do a "git pull" to get the current structure and contents of the tfb-bengalcharters-epigraphy repository.