contents of https://dharmalekha.info/editorial-conventions

erc-dharma / project-documentation

DHARMA Project Documentation

Creative Commons Attribution 4.0 International

3 stars 3 forks source link

contents of https://dharmalekha.info/editorial-conventions #273

Open arlogriffiths opened 8 months ago

arlogriffiths commented 8 months ago

@michaelnmmeyer and @danbalogh —

This page is now in place but there are some gaps in some columns and unformatted bits as the end. Can you work together on fixing or adding what needs to be fixed/added?

As for dha<supplied reason="undefined" evidence="previouseditor"/>rma</supplied> and dha<supplied reason="lost" evidence="parallel">abc</supplied> I presume we want display with square brackets and a mouseover recording the value of @evidence.

danbalogh commented 7 months ago

As far as I know, this is a copy of the github editorial conventions page, which in turn is a copy of the DHARMA cheatsheet, which was last edited in October 2020, and which has never progressed beyond draft stage, with a number of undecided details, many headings which are inaccurate, and some that are bad English. Someone does need to revise it at some point, and I don't mind if it's me and now, but in that case I will probably also reopen some of the solutions which I don't find agreeable.

To this end, I've made a copy of the first google document mentioned above, and started rewriting it quite thoroughly

https://docs.google.com/document/d/1jn0OLwtDxhxtuESBIpripwvrgbgyeDCK_tXfW0pKHw4/edit?usp=sharing

Comments are welcome and I would appreciate it if Michaël could check and correct/fill the DHARMA display column.

michaelnmmeyer commented 7 months ago

OK, looking at it.

michaelnmmeyer commented 7 months ago

I set up a new basic display with Dániel's tables: https://dharmalekha.info/editorial-conventions2. XML examples are processed live, so the output reflects what the display system is currently doing.

danbalogh commented 7 months ago

Thanks, this is great. A couple of first impressions: It's a bit difficult to parse at a glance, because the entries do not stand out. Part of the reason for this is the alternating lighter and darker grey background of the units as a whole, mixed with the white and grey backgrounds of the display and XML parts. I haven't paid attention to the design of the website, so don't know if the classes used here have been devised specifically for this page or if they recur in many other places. (The same alternation of light and dark grey in e.g. the list of texts looks good and helps parsing.) At any rate, if it does not interfere too much with the overall design of the site, I think it would be better not to have this alternation here; or not to give different background colours to the XML and the display sample, in which case the alternating greys could be retained for the items as a whole. Also, the item headings (the first column of the original table; now the items with class="catalog-card-heading") could be made to stand out more, perhaps with font styling or, probably better, by indenting the XML and display so that the item heads look like a hanging indent. We'll need some alterations in the XML samples in general to improve this further, and in particular because if you generate the display live from the XML (which I think is excellent!), then we'll need better XML in some places. What would be the best way to make such changes? Can I have edit access to a raw HTML (or other) snippet that you use in the generation of this page? Or would it be best if I proposed every change here on github and you implemented them? Some changes that occur to me (I'll have more later on, when I've looked more closely):

the verse line examples don't look right, perhaps because we'd need <lg> wrappers for the lines to render them correctly in display
in the example <lb n="1"/>svasti śrī<lb n="2"/>kōpparakēcari the spacing needs to be changed to <lb n="1"/>svasti śrī <lb n="2"/>kōpparakēcari
we'll need to add some text (e.g. a ... b) around the examples of space

arlogriffiths commented 7 months ago

A small request from my side: for display of metrical structure, can we use pretty symbols for long/guru rather than what look like plain hyphens?

- - - - - ⏑ - - ⏑ - ⏓

Brill apparently recommends en-dash this purpose. See https://brill.com/fileasset/downloads_static/static_fonts_metricalunicode.pdf. I'd like to align with their recommendations.

danbalogh commented 7 months ago

I agree with Arlo.

michaelnmmeyer commented 7 months ago

@danbalogh

OK, I changed styling to try to make things more readable.

For now, the page is generated directly from an HTML file manually exported from Google Docs, so you can keep modifying the table and I will re-generate the output. For the display to be automatically updated, however, the source file should be stored in project-documentation. I can clean up the HTML exported from Google Docs and store it there, if you are OK with that.

@arlogriffiths

This is corrected. By the way, a lot of meters in prosodic patterns have XML and prosodic representations that do not match. See e.g. campakamālā. It does not seem necessary to encode both representations, if the idea is to perform an automatic conversion, so we could just keep the XML representationN.

The use of of 2 pipe symbols || for representing double daṇḍas is also apparently ambiguous. In daṇḍaka, we have ------|‖+-+‖. If this is represented as ------|||+-+||, I do not see how to determine whether |‖ or ‖| is the correct procodic representation.

arlogriffiths commented 7 months ago

See https://dharmalekha.info/prosody. If @danbalogh approves of this manner of presewnting Prosodic Conventions, then #278 can be closed. I let him answer Michaël's remarks addressed to me just above.

danbalogh commented 7 months ago

@michaelnmmeyer : either way of editing the legend/cheatsheet is fine for me, and I suppose it's much less trouble for you if I can work directly on the cleaned HTML stored in project documentation. So let's go that way if you agree.

About the XML vs prosodic symbol templates: do all the mismatches involve the presence of | signs in one and their absence in the other, as in campakamālā? This may be by design, but if so, I do not recall what the underlying idea was; @arlogriffiths might. However, looking at the EGD appendix (the line with "caesurae are indicated in conventional notation for the sake of accuracy"), it seems to me that at some point the idea was that we would not show caesurae (double bars) in the XML code, since they are not be used in @met for lacunae, while in the prosodic notation, caesurae would be shown. (However, looking at the same EGD appendix, the table that was the source of our current prosody file, I see that bars are in fact present in many of the cells with XML code, so we or I did this inconsistently). At any rate, I think maintaining this distinction is not essential, so I have no objection to keeping just the XML notation in the file and displaying the prosodic notation auto-generated from that - but in this case the bars have to be added to the XML characters where they are not present. HOWEVER, this seems to work only for the fully syllabo-quantitative metres. The system would break down for the moraic metres (where I see there is no prosodic notation shown in the table, and the XML notation is also inaccurate; see Table 7. Specifics of moraic metres in the EGD for the accurate representation). It would also break down for some of the miscellaneous metres, such as the vaitālīya group, where the XML notation's "6" is not entirely equivalent to the prosodic notation's "⏕ ⏕ ⏕" [as the former would permit the pattern ⏑–⏑ while the latter does not; I don't know off the top of my head which is correct].

Three bars should never occur. It seems to me that for daṇḍa, daṇḍaka and malāhati, all under "Additions in need of processing", double bars ‖ have been used as a sort of parenthesis meaning "any number of iterations of the sequence between these". We do not have a "standard" notation for this kind of thing and I don't think we need one.

So unfortunately the prosody file cannot do justice to metres other than the purely syllabo-quantitative ones. I think that the daṇḍa and other repetitive patterns at the end of the prosody file will need to be written up in words. I am not sure where that (and the other complex stuff in the EGD appendix) should live: either it can become a part of the prosody file with text and tables, or it can remain as an EGD appendix, to which the prosody file just refers.

In commit 1d74718 I have restored the double bars to the XML notation of syllabo-quantitative metres where it was missing. There isn't really anything I can do at the moment about the ones in need of processing.

Bottom line: I suggest NOT discarding the prosodic notation for the time being, until a considered decision can be made about what we keep in this file and where we write up the stuff that cannot be expressed with simple formulae. If the double bars are not wanted after all in the XML notation, they should be simple to remove.

michaelnmmeyer commented 7 months ago

OK for editorial conventions.

For prosodic patterns, the following have a mismatching xml and prosodic representation:

śāpantika
  ++-----=/++--+--=
  ––⏑⏑⏑⏑⏑⏓/––⏑–⏑⏑⏑⏓

pādānuṣṭubh
  ========
  no data available

rucirā
  -+-||+----+-+-=
  ⏑–⏑–||⏑⏑⏑⏑–⏑–⏑⏓

kr̥ti
  --+-+---+------+-+-=
  ⏑⏑–⏑–⏑⏑⏑–⏑⏑⏑⏑⏑⏑⏑–⏑–⏑⏓

vikr̥ti
  ++------+-+---+------=
  ⏑⏑⏑⏑–⏑⏑–⏑⏑⏑⏑⏓

madraka
  +--+-+---+||-+----+----=
  –⏑⏑–⏑–⏑⏑⏑–||⏑–⏑⏑⏑–⏑–⏑⏑⏑⏓

āryā
  4|4|4||4|4|4|4|2/4|4|4||4|4|1|4|2
  no data available

gīti
  4|4|4||4|4|4|4|2/4|4|4||4|4|4|4|2
  no data available

upagīti
  4|4|4||4|4|1|4|2/4|4|4||4|4|1|4|2
  no data available

udgīti
  4|4|4||4|4|1|4|2/4|4|4||4|4|4|4|2
  no data available

āryāgīti
  4|4|4||4|4|4|4|4/4|4|4||4|4|4|4|4
  no data available

sugīti
  4|4|4||4|4|4|4|4/4|4|4||4|4|1|4|2
  no data available

anugīti
  4|4|4||4|4|1|4|2/4|4|4||4|4|4|4|4
  no data available

vallarī
  4|4|4||4|4|4|4|4/4|4|4||4|4|4|4|2
  no data available

lalitā
  4|4|4||4|4|4|4|2/4|4|4||4|4|4|4|4
  no data available

vaitālīya
  6|+-+-=/8|+-+-=
  ⏕⏕⏕–⏑–⏑⏓/⏕⏕⏕⏕–⏑–⏑⏓

aupacchandasika
  6|+-+-+=/8|+-+-+=
  ⏕⏕⏕–⏑–⏑–⏓/⏕⏕⏕⏕–⏑–⏑–⏓

āpātalikā
  6|+--+=/8|+--+=
  ⏕⏕⏕–⏑⏑–⏓/⏕⏕⏕⏕–⏑⏑–⏓

daṇḍa
  ‖+-+‖|+--+-+-=
  ‖–⏑⏑‖|–⏑⏑–⏑–⏑⏓

paṅkti
  --------+=
  ⏑⏑⏑⏑⏑⏑⏑⏑–⏓[need to be checked]

atijagatī
  ----+--+-----=
  ⏑⏑⏑⏑–⏑⏑–⏑⏑⏑⏑⏓

danbalogh commented 7 months ago

Thanks for the list. I've done another push correcting what I could, but it was not much.

śāpantika is incorrectly recorded. I have written an XML comment about the errors in it (it isn't just the mismatch), but this should be corrected by whoever created this record in the first place.
pādānuṣṭubh: I don't know this metre. I have inserted the prosodic notation corresponding to the XML notation, but this entry should be rechecked by whoever created it, and also completed with bibliography. At any rate, if it is really free octosyllabic verse, then its place is definitely NOT among syllabo-quantitative metres.
rucirā: corrected. Was probably my mistake.
kr̥ti: corrected (hopefully).
vikr̥ti: cannot correct. Incorrectly recorded, no reference, problematic name, I don't know this metre.
madraka: corrected.
metres on the list from āryā onward: are either not purely syllabo-quantitative (and so cannot be represented accurately in prosodic notation), as noted above, or incompletely recorded and "in need of processing" as per my remarks above.

michaelnmmeyer commented 7 months ago

The source file for https://dharmalekha.info/editorial-conventions is now here. It can be edited freely, the display will update itself.

danbalogh commented 7 months ago

Thanks, I'll take a look and make some revisions.

danbalogh commented 7 months ago

I have pushed a revised version of the editorial conventions file. I guess it will take some time for https://dharmalekha.info/editorial-conventions to update accordingly.

Technical notes:

Word divided across verse lines (enjambement)
- the display does not seem right: due to the presence of enjamb, there ought to be a hyphen at the end of the first line (after sakalārāti). This hyphen should also be shown in logical display, but not in physical.
the file has cuvikaṇṭhi-, but the displayed code is cuvikaṇṭhi-
Tentative explanatory insertion: <supplied reason="explanation" cert="low"> should be displayed with a question mark, e.g. "(Amoghavarṣa I?)".
Likewise for Words tentatively inserted for clarity or syntactical correctness, <supplied reason="subaudible" cert="low">, e.g. "[The donee?]".

Issues to discuss with @arlogriffiths and any other PIs who are ready to be involved:

I have arbitrarily deleted the section involving sub-akṣara markup, since there is no display for it. If anybody desperately wants it back on our editorial conventions page, please scream and I'll put it back.
We currently have headings with "intrinsic structure" and "extrinsic structure". I wonder if I should change that term to "logical structure" and "physical structure" respectively, both in the Conventions and in the next EGD, the better to harmonise with "logical/physical" display on the website.
Line numbers, pb and milestone elements are currently displayed in grey text and wrapped in parentheses (). If the text is copied into another application without formatting, this becomes indistinguishable from unclear; and even with the formatting, parentheses should be reserved for unclear readings. I therefore suggest, as I always have, that line (etc.) breaks be displayed in ⟨⟩ (which otherwise stands for “editorial addition”, so even without formatting it makes for a pretty clear message). I agree with keeping the grey colour and suggest, but do not insist, that superscript be used for these.
Kinds of lacunae (lost/illegible/undefined) are distinguished in display when their size is recorded in quantity of characters (using × + *) or in quantity of lines (using the words lost/illegible/lost or illegible), but are not distinguished when the size of a lacuna is encoded as an unknown number of characters (in which case the display is always [...]). They are also not distinguished if the lacuna has a prosodic pattern to it. I think that this distinction adds very little value to an edition while being confusing to the reader, and suggest that we get rid of the distinction in all lacuna displays measured in characters (except that the distinction would be spelled out in the tooltips, as it is now), and keep it only in those measured in lines, where they are spelled out to begin with. In tandem with this, I suggest that we harmonise our lacuna display with the Leiden+ convention. Thus, regardless of the value of @reason
- <gap extent="unknown" unit="character"/> would be displayed as [.?],
- <gap unit="character" quantity="3"/> would be displayed as [.3], and
- <gap unit="character" quantity="3" precision="low"/> would be displayed as [ca. 3]
- I am also open to other specific displays so long as the distinction based on @reason is eliminated. For instance, if Leiden harmonisation is not important, we could keep [...] for extent unknown, and use [3] and [ca. 3] for exact and approximate number of characters. Or whatever.
- I could also imagine going the whole hog and discarding the distinction for line-level lacunae too. In that case, the display could simply be [? lines], [3 lines] and [ca. 3 lines]. Leiden+ doesn't really work here, because they include the word "lost" (and only that, apparently not recognising other kinds of lacuna).
I have arbitrarily deleted the section on lacunae of lines possibly lost (confidently estimated / tentatively estimated / unknown number). This encoding is rare and the display, I think, spells it out, so I see no need to complicate our Editorial Conventions page with it. If anybody wants it back, scream.

manufrancis commented 7 months ago

Thanks both of you for this!! Already awesome on the website.

michaelnmmeyer commented 7 months ago

@danbalogh OK, fixed.

arlogriffiths commented 6 months ago

Three weeks ago, Dan listed some "Issues to discuss with @arlogriffiths and any other PIs who are ready to be involved". Sorry for being slow to react.

I have arbitrarily deleted the section involving **sub-akṣara markup**, since there is no display for it. If anybody desperately wants it back on our editorial conventions page, please scream and I'll put it back.

Where was this section previously? It seems to me we presently have [.] in display for a last component of an akṣara. Is this the kind of stuff you're talking about? E.g. <seg type="aksara"><seg type="component" subtype="body"><gap reason="illegible" quantity="1" unit="component"/></seg>u</seg> displayed like this:

That display is perfectly acceptable for me but is not recorded for this markup in the list of editorial conventions. I think it should be. It is of course unacceptable, on the other hand, that this display is exactly the same as what we presently have for <supplied reason="subaudible">.</supplied>, which does figure in the list of conventions. (We have discussed the need to change display of <supplied reason="subaudible"> also in #303 and it may already be implemented as I write this.)

We currently have headings with **"intrinsic structure" and "extrinsic structure"**. I wonder if I should change that term to "logical structure" and "physical structure" respectively, both in the Conventions and in the next EGD, the better to harmonise with "logical/physical" display on the website.

I am rather attached to the terms "intrinsic structure" and "extrinsic structure" and somewhat less fond of "logical" in the alternative pair of terms, but it is true that "logical/physical" has the virtue of simplicity. So I don't mind your implementing the changes you have in mind in Conventions and EGD.

* **Line numbers, pb and milestone elements** are currently displayed in grey text and wrapped in parentheses (). If the text is copied into another application without formatting, this becomes indistinguishable from unclear; and even with the formatting, parentheses should be reserved for unclear readings. I therefore suggest, as I always have, that line (etc.) breaks be displayed in ⟨⟩ (which otherwise stands for “editorial addition”, so even without formatting it makes for a pretty clear message). I agree with keeping the grey colour and suggest, but do not insist, that superscript be used for these.

I am happy to follow Dan on these points almost integrally. But I think's I'd vote for keeping grey color and no superscript.

* **Kinds of lacunae (lost/illegible/undefined)** are distinguished in display when their size is recorded in quantity of characters (using × + *) or in quantity of lines (using the words lost/illegible/lost or illegible), but are not distinguished when the size of a lacuna is encoded as an unknown number of characters (in which case the display is always [...]). They are also not distinguished if the lacuna has a prosodic pattern to it. I think that this distinction adds very little value to an edition while being confusing to the reader, and suggest that we get rid of the distinction in all lacuna displays measured in characters (except that the distinction would be spelled out in the tooltips, as it is now), and keep it only in those measured in lines, where they are spelled out to begin with.

So far I agree.

In tandem with this, I suggest that we harmonise our lacuna display with the Leiden+ convention. Thus, regardless of the value of @reason

  `<gap extent="unknown" unit="character"/>` would be displayed as [.?],
  `<gap unit="character" quantity="3"/>` would be displayed as [.3], and
  `<gap unit="character" quantity="3" precision="low"/>`would be displayed as [ca. 3]
  * I am also open to other specific displays so long as the distinction based on `@reason` is eliminated. For instance, if Leiden harmonisation is not important, we could keep [...] for extent unknown, and use [3] and [ca. 3] for exact and approximate number of characters. Or whatever.

I am not particularly attached to the distinction × + * so also agree with this part of Dan's proposal.

  * I could also imagine going the whole hog and discarding the distinction for line-level lacunae too. In that case, the display could simply be [? lines], [3 lines] and [ca. 3 lines]. Leiden+ doesn't really work here, because they include the word "lost" (and only that, apparently not recognising other kinds of lacuna).

Agreed on this suggestion too.

* I have arbitrarily deleted the section on lacunae of **lines possibly lost** (confidently estimated / tentatively estimated / unknown number). This encoding is rare and the display, I think, spells it out, so I see no need to complicate our Editorial Conventions page with it. If anybody wants it back, scream.

I don't feel an need to scream.

danbalogh commented 6 months ago

Thanks, @arlogriffiths , we are making good progress here.

The section on sub-akṣara markup in the old editorial conventions looked like this: i.e. it was in the section on palaeographic features, and included only the <seg type="component"> markup, which does not appear in display, so there's no point illustrating it here. There has been no illustration of sub-akṣara lacunae in the conventions. I don't mind adding one, since as you showed above, we do have display for sub-akṣara lacunae. Since we seem to agree on showing supplied punctuation in angle brackets, showing this one in square brackets does not cause a conflict, so all's well. I am pushing an updated conventions file now, with examples for a lost vowel and a lost body. Does this sound OK to you?
I'm also happy to keep extrinsic and intrinsic. In fact I've always thought of these two as my terms, and logical/physical as yours (my earlier terms for those two kinds of display were "diplomatic" and "curated" edition). So for the time being at least, let's stick to extrinsic/intrinsic in both the Conventions and the EGD. OK?
Line numbers, page numbers and milestones are to be shown in grey, with angle brackets, no superscript. I'm fine with that. @michaelnmmeyer , this answers your email sent a short while ago.
Lacuna display. So Arlo and I agree that distinguishing between the kinds of @reason (viz. lost/illegible/undefined) in display is not really useful and should be eliminated. As I recall, @manufrancis played an active role when we first sorted out our present display solutions, so I'd be happy to have his confirmation that we can proceed with this. If we have that, then @michaelnmmeyer will need to change the display of <gap> elements so that they show as follows, with any value of @reason (but see below) :
- <gap extent="unknown" unit="character"/> - [.?]
- <gap unit="character" quantity="3"/> - [.3]
- <gap unit="character" quantity="3" precision="low"/> - [ca. 3]
- <gap extent="unknown" unit="line"/> - [? lines]
- <gap unit="line" quantity="3"/> - [3 lines]
- <gap unit="line quantity="3" precision="low"/> - [ca. 3 lines]
- gap elements within <seg> with @met display the metrical pattern in square brackets, as before I see one detail that still needs to be sorted out here. Our encoding permits <gap reason="omitted"/> for scribal omissions not restored by the editor (EGD §6.4). Are we OK to display this in the same way as lacunae? (At present, the display is, I believe, the same.) I'm not very happy about this, since the reader would not know whether it's a lacuna or an omission, but I don't have a solution that would be intuitively more transparent. Perhaps in this case use angle brackets around the content instead of square brackets? The rationale for that would be that (single) angle brackets are used for editorial insertion, and in this case we want to show that something needs to be inserted here, but we don't know what.

michaelnmmeyer commented 6 months ago

3. Line numbers, page numbers and milestones are to be shown in grey, with angle brackets, no superscript. I'm fine with that. @michaelnmmeyer , this answers your email sent a short while ago.

This is done.