erc-dharma / tfc-nusantara-epigraphy

DHARMA project task force C, Nusantara epigraphic corpus
https://dharma.hypotheses.org/
Creative Commons Attribution 4.0 International
1 stars 0 forks source link

questions re. encoding of Linggawangi.xml #10

Open arlogriffiths opened 4 years ago

arlogriffiths commented 4 years ago

@danbalogh I have just pushed a version of the file with some questions addressed to you. Please answer here.

danbalogh commented 4 years ago
  1. supplying anusvāras for standardisation: no conclusion reached; we should return to this at some point, hopefully after more input from the Markup list. Provisionally I suggest either A. no markup (just keep the original reading); or B. choice with orig and reg on the whole word. For a long-term solution, I am now inclined toward <supplied reason="subaudible">, analogous to supplied avagrahas. But my inclination keeps changing. Gabby on Markup seems set against introducing a new value for supplied.
danbalogh commented 4 years ago
  1. inscribed susuku for susuk· ku. I'm not sure we need a special rule for the non-writing of gemination. While a large number of special rules makes for more consistency, it also makes the EG that much harder to digest and keep in mind. I'd suggest this be dealt with as per A or B under 1 above: just ignore it, or orig and reg the whole pair of words. If you definitely prefer to supply just the implied , and to supply it as (rather than k), then I agree it should come after susu. But then again, perhaps the inscribed ku is in fact "shorthand" for kku ...
danbalogh commented 4 years ago
  1. add "entry" to the list of permit values for @unit in <citedRange> Wouldn't "item" be suitable? Now suggested for "a number in an anthology", but if we discard the idea of displaying that as №, it could be applied to such entries. But on second thoughts, perhaps indeed better to keep "entry" distinct. Display as "s. v."? Give me a final word and I'll add it to the EG.
danbalogh commented 4 years ago
  1. Markup for quotes that are not citations from a publication. I have no preference and no previous experience. It may be best to get @ajaniak's opinion. I think <quote> may be fine, but by the TEI Guidelines, that is normally for "Quotations from other works". Perhaps <q> is better suited for this purpose, but that would introduce yet another markup element that we do not use otherwise (unless we adopt it for italicisation, which I hope we don't.) At any rate, we should indeed agree on some form of markup, which has the advantage of automatically producing the correct quotation marks.
arlogriffiths commented 4 years ago
  1. let's go back to option B then, which is what Aditia had in his initial encoding of Linggawangi.xml. (But, @danbalogh : I remain in favor of a long-term solution allowing mark-up to be limited to the character or chararcters supplied — could you add a stub to EG to mark where any future rule regarding such cases will be presented, and what options are on the table?)
  2. @aditiagunawan : please mark up like this <choice><orig>susuku</orig><reg>susuk ku</reg></choice>. (@danbalogh : agreed? this means enclosing two words in a single <choice>.)
  3. Yes, let's add @unit="entry" to EG and state explicitly that it is intended to be displayed with s.v.
  4. @danbalogh : Let us provisionally add to the EG that <quote> is to be used also to achieve quotation marks around translations of words or phrases, but that if the encoder insists he can override our transformation by typing precisely the unicode signs for the desired quotation marks („...” “...” ‘...’ «...»). Then please add a comment to the new bit of EG contents asking @ajaniak to express her opinion.
danbalogh commented 4 years ago
  1. The EG stubs are done. See added text in red under §6.1/Good practice in normalisation, §6.2/Editorial deletion, and §6.2/Editorial addition. Note that there is also the option of just flagging on the entire word.

  2. I think your suggestion is preferable even if we adopt a method for adding individual characters as normalisation. There will need to be some indication of this in the EG. In your suggestion to Aditia, is <reg>susuk ku</reg> deliberate or is it a typo for <reg>susuk· ku</reg>? I cannot give you a qualified opinion on which would be better, but it seems to me that the latter would be more apt.

Given what Gabby said on Markup about how most people view normalisation, we should still consider how extensively we want to normalise. Flagging may be preferable in most cases, and it is in fact what the EG says at the moment under Good practice in normalisation, which does not explicitly recommend normalisation for any phenomenon, and instead groups non-standard features into three classes, with the following recommended strategies:

  1. ignore unless deemed important on a case-by-case basis
  2. ignore or flag depending on corpus, but do not normalise without good reason
  3. flag and optionally normalise

Arlo, please add some thoughts in comments to the EG, and we should probably have a Skype discussion one of these days.

danbalogh commented 4 years ago

3 and 4 are done.

danbalogh commented 4 years ago

Arlo, is there anything in this issue that still needs action from me?