erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

Cheatsheet: <gap/> #19

Closed ajaniak closed 3 months ago

ajaniak commented 4 years ago

Dear all,

Could you tell if you have already made decisions regarding ?

Thanks, Best

arlogriffiths commented 4 years ago

Not as far as I know. Here are some proposals. Others may please add the scenarios that I am forgetting, and propose other manners for displaying gaps.

manufrancis commented 4 years ago

Decisions not yet made, indeed.

My suggestions:

For <gap unit="character"> When the quantity of characters is known:

When the quantity of characters is unknown: [...]

For <gap unit="line"> superscript, bold, between parentheses with ? when precision is low.

(1 line lost) (1 line lost?) (line/lines lost) (line/lines lost?)

arlogriffiths commented 4 years ago

In the system Manu suggests, we could use + and ? to differentiate @reason="lost" and @reason="illegible" — I believe this is what is done in EIAD display (inspired by Buddhologists' conventions).

danbalogh commented 4 years ago

I would prefer if we always showed <gap> in square brackets, restored text likewise, and the metre of a gap likewise. This would be consistent internally, and would also match the Leiden conventions, where square brackets always mean a lacuna. My specific suggestions:

manufrancis commented 4 years ago

I am fine with square brackets throughout.

I am fine (provisionally, for the purpose of proof-readings our encodings) with the proposals of Dan of using *, #, . depending of the nature of the gap. But for me, too much display renders editions somehow difficult to read. The nature of the gap is not a concern for me when first reading an inscription (and, if I need the information I will go to the XML file).

I prefer [ ] to [*8] as it visually gives an idea of the extent of the gap.

I find ⏓ for lost "vowel" misleading. For me, it refers to a short or long lost syllable.

arlogriffiths commented 4 years ago

I agree with Manu's views, notably on finding ⏓ for lost "vowel" misleading.

I'd be happy if we could make some gesture to the conventions of EIAD (and Vincent T. who is especiallt attached to them) by using + and ? rather than * and #.

Dan: could you attempt a new proposal taking Manu's and my reactions into account?

manufrancis commented 4 years ago

I am happy to make a gesture and vote for + and ? rather than * and #.

danbalogh commented 4 years ago

I can't harmonise the above suggestions with the ones made by me, because the changes you want break other parts of the scheme. I don't insist on * and # (though I personally am attached to them), and I also think Manu may have a point that we don't necessarily have to display the distinction between illegible and lost (and undefined). A tooltip can be added for that, so the user doesn't have to look up the XML. I'll list the specific problems and suggest some possible solutions. Please comment away, let's come closer to an agreement, and then I'll write up a full list of code/display pairs again.

arlogriffiths commented 4 years ago

Thanks a lot for these considerations. The problem re. "?" is indeed a significant obstacle. I have spoken with Vincent T. and he gives us carte blanche to come up with a coherent system, ideally one that has the greatest chance of being adopted by the greatest number of colleagues in our field(s).

I am inclined then to accept the core of what Dan has proposed, with the modification requested by Manu (the number of signs +/ corresponds to the value of @n), and possibly with × instead of .

I don't have an alternative to offer for to ⏓ and the other prosodic symbols for lost vowels, so I suggest we retain them at least provisionally.

danbalogh commented 4 years ago

I'll wait a little to see if Manu and Annette want to offer more thoughts, then create a list.

Meanwhile, while working on the EG, I've realised that we'll also have <gap> in the translation div, and those will need to be displayed differently. According to the Guide (and based on Arlo's suggestion), all lacunae in translations will be displayed as text in square brackets, e.g. [3 characters illegible], [3 characters lost], [3 lines lost]. Thus, in the translation div, a gap without attributes should be displayed as [...], and one with attributes should be composed on the basis of attribute values. If possible, line/lines and character/characters should be used depending on the @quantity. Though the EG doesn't say so, perhaps some people will also put @precision in such gaps, in which case we need to add ca., e.g. [ca. 3 characters lost].

AnneSchmiedchen commented 4 years ago

Dear colleagues, I do not have to offer any specific thoughts. Just two very minor questions: In the 3rd entry from top, Manu had written: [when the number of missing characters is known] "preceded and followed by space (even if we know the gap is inside a single word)". Has this been decided? I would prefer not to put space if the gap is inside a single word. Regarding Dániel's suggestion, lost syllable of unknown length is [⏓], while lost vowel of unknown length is ⏓ without brackets: Would it not make more sense the other way round?

danbalogh commented 4 years ago

Annette, I assume it was not your intention to close this issue, so I'm reopening it.

Good point about spacing around gap. I did not spot that in Manu's comments. As per the present EG, encoders are specifically instructer (§8.1/Editorial spaces and markup) to explicitly add spaces around <gap> elements except where they meet a partially preserved word, in which case no space should be used. If the encoders can manage that, then indeed, the display of the <gap> should not create any spaces, just preserve any space present in the XML around the element. However, if any of you think encoders can't be expected to be mindful of such things, it may simplify matters if we automatically displayed spaces around gap display - in that case there would be no way of indicating whether the adjacent word is complete or partial, but we can perhaps live with that. So please add your votes. My preference is to keep things as they are and not add space in gap display.

As to ⏓ with or without brackets, I think it makes good sense to display all "proper" lacunae (i.e. those affecting at least a full aksara) in square brackets. A single-syllable lacuna encoded with @met to show that its prosodic length is determined by verse but it happens to be an anceps will be very rare, perhaps as rare as a lost vowel of unknown length attached to a preserved. Most of the time, ⏓ in square brackets will occur as part of a longer sequence, e.g. [––⏑––⏑⏓]. Are you suggesting that lacunae with known metre should be shown without square brackets, e.g. ––⏑––⏑⏓? We could go that way, but I find that inconsistent; to my mind the following comprise a good class of display and should be handled similarly, in square brackets: [+++++++] seven lost characters, no information about them (note, I'm not explicitly endorsing the + sign here; my point is about the brackets) [––⏑––⏑⏓] seven lost characters to a given prosodic pattern [śārdūlavikrīḍitam] text lost and restored

danbalogh commented 4 years ago

Incidentally, my listing of bracketed stuff above has given rise to another display issue we should consider. When <gap> and/or elements of different kinds occur side by side, do we want to display them in a single set of brackets - and if yes, how easy is that to implement in the transformation? E.g. an extreme example: śā<supplied reason="illegible">rdū</supplied><supplied reason="lost">la</supplied><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/> display: śā[rdū][la][+][××] or śā[rdūla+××] ?

arlogriffiths commented 4 years ago

yes, definitely, they need to be collapsed into a single set of brackets. from previous collaborations with Tom E. and Emmanuelle M., I have the impression that there are some complications, but it can be done.

Le 27 mars 2020 à 10:03, Dániel Balogh notifications@github.com<mailto:notifications@github.com> a écrit :

Incidentally, my listing of bracketed stuff above has given rise to another display issue we should consider. When and/or elements of different kinds occur side by side, do we want to display them in a single set of brackets - and if yes, how easy is that to implement in the transformation? E.g. an extreme example: śārdūla display: śā[rdū][la][+][××] or śā[rdūla+××] ?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/project-documentation/issues/19#issuecomment-604891297, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGMAEYRO6J45BVYB54GNRLRJRTUZANCNFSM4LUCZ52Q.

manufrancis commented 4 years ago

I am fine with:

AnneSchmiedchen commented 4 years ago

I also agree to the points summarised by Manu.

danbalogh commented 4 years ago

I'll try now to summarise all points.

Points that still await a decision are:

OR, with apologies for being obstinate, we could go back to what I suggested above, changing * to + and # to ×. That way, we would have to relinquish showing as many lacuna markers as the number of characters lost, but everything else could be displayed in a consistent manner (except for the @reason of loss in sub-akṣara lacunae, which is hardly an important point). Displaying the size of lacuna with a numeral (instead of iterated signs) conforms to the Leiden conventions.

manufrancis commented 4 years ago

Dear Dan, Thanks for the summary.

On pending issues:

arlogriffiths commented 4 years ago

I approve of all of Manu’s responses and favor the * over the # (because, at least for English speakers, the sign # is intimately associated with the meaning ‘number’, which is not relevant in our context).

Arlo

Le 1 avr. 2020 à 17:10, manufrancis notifications@github.com<mailto:notifications@github.com> a écrit :

Dear Dan, Thanks for the summary.

On pending issues:

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/project-documentation/issues/19#issuecomment-607307110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGMAE4GY7XEHALIDVLC3DLRKNKPLANCNFSM4LUCZ52Q.

danbalogh commented 4 years ago

OK. Comments on Manu's response:

And so, to paraphrase you and summarise the answers to the issues I noted above as still open:

I'm essentially fine with that. I don't really like the "ca. +++++" style of display, but let's go ahead with that, and maybe we'll have a better idea by the time we come to website display. Let me also point out that by the present scheme, the @reason of a lacuna is distinguished in display when the length of the lacuna is known precisely or approximately, but it is not distinguished when the size is unknown, nor when it is smaller than one akṣara. That again is something I'm essentially fine with.

Sometime soon (probably tomorrow) I'll distil all this into a final summary, unless someone vetoes it in the meantime.

danbalogh commented 4 years ago

Righty-ho, onward to the <seg cert="low">last</seg> summary.

What's new in the above:

arlogriffiths commented 4 years ago

sorry, I hadn’t had time to write yet: I am tempted to revert to a system (like that originally proposed by Dan, I think) of using a number (value of @quantity) plus a symbol for the type of gap, rather than the number of missing characters as such. The advantages of doing so, as requested by Manu, seem to me rather unimportant, esp. compared to the disadvantage of having enormous strings of + or * in the case of long gaps.

Le 2 avr. 2020 à 10:03, Dániel Balogh notifications@github.com<mailto:notifications@github.com> a écrit :

Righty-ho, onward to the last summary.

What's new in the above:

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/project-documentation/issues/19#issuecomment-607687128, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGMAEY65H3MP33NN6472NTRKRBEZANCNFSM4LUCZ52Q.

danbalogh commented 4 years ago

OK, I'll wait then until this is settled. If I have a vote, mine is on displaying numbers instead of strings of characters. However, one thing we could do - perhaps not right now, but later on - is to include a parameter in the transformation that would let us toggle these alternatives.

ajaniak commented 4 years ago

Dear All,

Have you been able to make a final decision for the display of <gap/> with symbols or numbers?

Thanks.

arlogriffiths commented 4 years ago

I am still in agreement with Daniel's for displaying the number of @quantity instead of strings of characters. If that is the last point of diagreement, and Manu & Annette can bring themselves to agree with Daniel and me, I would suggest we ask Daniel to formulate once again the results of this discussion.

manufrancis commented 4 years ago

I can live ... for the moment ... with displaying the number of @quantity instead of strings of characters ... but might put the issue forward again when time will come to discuss the display on the website.

danbalogh commented 4 years ago

So, here is my attempt at a final recap. Actually, now that I think about it I'm afraid I can't distill a definitive consensus from the above, since all we've agreed on is that we incorporate some of my last summary into something along the lines of my first summary, using numbers instead of iterations of signs. But the level of incorporation is not certain, as for example in the "ca." suggested above by Arlo. So in the list below, I'll try to suggest alternatives. In all cases, bold face marks my preference, and the word QUESTION highlights items where we still need some discussion.

Can I have votes from the PIs for one of the options under each numbered item, and opinions on the questions?

  1. <gap reason="lost" extent="unknown" unit="character"/>
    • A: [+++] (different sign depending on reason, as in my first summary, but using the signs we have agreed on)
    • B: [...] (no distinction per reason, as in last summary, but perhaps clearer as there is no implication of "three characters lost")
    • C. [?+] (this is a new suggestion by me, to retain the facility of distinction by reason, yet avoid the implication of 3 characters lost)
  2. <gap reason="lost" quantity="5" unit="character"/>
    • A: [+5] (same structure as in my first summary)
    • B: [5+] (inverted structure, as currently implemented I think)
  3. <gap reason="lost" quantity="5" unit="character" precision="low"/>
    • A: [+5?] (same structure as in my first summary)
    • B: [?5+] (inverted structure)
    • C: [ca. 5+] (as suggested by Arlo for iterated signs but perhaps also desired here)
    • D: any other arrangement, so long as it harmonises with the display for 2 above
  4. <gap reason="illegible" extent="unknown" unit="character"/>
    • A: [×××] (different sign depending on reason)
    • B: [...] (no distinction per reason, but perhaps clearer as in 1B)
    • C. [?×] (new suggestion by me, to keep the cake and eat it)
  5. <gap reason="illegible" quantity="5" unit="character"/>
    • A: [×5] (same structure as in my first summary)
    • B: [5×] (inverted structure, as currently implemented I think)
  6. <gap reason="illegible" quantity="5" unit="character" precision="low"/>
    • A: [×5?] (same structure as in my first summary)
    • B: [?5×] (inverted structure)
    • C: [ca. 5×] (as suggested by Arlo for iterated signs but perhaps also desired here)
    • D: any other arrangement, so long as it harmonises with the display for 5 (and 2, 3) above
  7. <gap reason="undefined" extent="unknown" unit="character"/>
    • A: [***] (different sign depending on reason)
    • B: [...] (no distinction per reason, but perhaps clearer as in 1B)
    • *C. [?] (new suggestion by me, to keep the cake and eat it)**
  8. <gap reason="undefined" quantity="5" unit="character"/>
    • A: [*5] (same structure as in my first summary)
    • B: [5*] (inverted structure, as currently implemented I think)
  9. <gap reason="undefined" quantity="5" unit="character" precision="low"/>
    • A: [*5?] (same structure as in my first summary)
    • B: [?5*] (inverted structure)
    • C: [ca. 5*] (as suggested by Arlo for iterated signs but perhaps also desired here)
    • D: any other arrangement, so long as it harmonises with the display for 5 (and 2, 3) above
  10. when @unit="line", display as text in square brackets (I think we have consensus here, so no alternatives for this item)
    • <gap reason="lost" quantity="1" unit="line" precision="low"/> ---> [ca. 1 line lost]
    • <gap reason="lost" quantity="2" unit="line" precision="low"/> ---> [ca. 2 lines lost]
    • <gap reason="lost" extent="unknown" unit="line"/> ---> [unknown number of lines lost]
    • same, mutatis mutandis, for @reason="illegible"
    • for @reason="undefined" I suggest the text "lost or illegible" added after the number of lines
    • if the <gap> is not empty and contains a <certainty/> element, add "possibly" to the text, e.g. [ca. 2 lines possibly lost]
    • note that as per the EG, we will never have an exact number of lines lost: if the number is known precisely, then the encoding is with iterated <lb/> followed by an inline lacuna; however, an exact number of lines lost may in principle be encoded with <certainty> inside (e.g. "3 lines possibly lost"). It may be a good idea to generate an error message if a <gap> has @quantity (any value) and @unit="line" but neither has @precision nor contains <certainty/>
  11. when <gap> (with a @unit other than "component") is within <seg> with @met, then instead of the above, display the value of @met converted to prosodic notation, in square brackets
    • e.g. <seg met="+-+"><gap reason="lost" quantity="3" unit="character" /></seg> displayed as [–⏑–]
    • I think we have consensus here, so no alternatives for this item; however, QUESTION: shall we reintroduce distinction by @reason here? We could use e.g. [+ –⏑–], [× –⏑–] and [ –⏑–] or [–⏑– +], [–⏑– ×] and [–⏑– ], but I'm afraid the +×* signs would create confusion next to the prosodic notation, so perhaps best not to.
  12. fusing sets of brackets: we need a new consensus on this, choosing one of the following options
    • A. no fusion for any of the above
    • B. fuse brackets for different reasons of gap, but no fusion for anything else
    • C. fuse brackets for different reasons of gap and for gap in <seg> with @met, but not for <supplied>
    • D. fuse brackets for all of the above
  13. <gap> with @unit="component": display as follows, without square brackets (I think we have consensus here, so no alternatives for this item, but see the questions below)
    • note that this will only occur inside <seg type="component"> (generate error message if it occurs anywhere else?)
    • if the enclosing <seg> has @met, then display as ⏑ if @met="-" and display as – if @met="+" [no other value of @met is permitted for a <seg type="component">]
    • if the enclosing <seg> has NO @met, and the <gap> has @subtype="vowel", display as ⏓
    • in all other cases display as *
    • QUESTION: perhaps change this to [.] so as not to create confusion with * used for @reason="undefined" and to make it clear that this display is for a lost segment
    • note: no distinction between values of @reason in this case
    • ONE MORE QUESTION: perhaps display all cases of <gap> with @unit="component" as [.], since some of us are averse to prosodic notation in this case
manufrancis commented 4 years ago

Dear Dan, thanks for this recap. Let me ponder a little more before I answer and vote.

manufrancis commented 4 years ago

So here is my vote, based on the following considerations :

Thus :

  1. <gap reason="lost" extent="unknown" unit="character"/> [...]

  2. <gap reason="lost" quantity="5" unit="character"/> [5+] (my favourite is still [+++++])

  3. <gap reason="lost" quantity="5" unit="character" precision="low"/> [ca. 5+]

  4. <gap reason="illegible" extent="unknown" unit="character"/> [...]

  5. <gap reason="illegible" quantity="5" unit="character"/> [5×]

  6. <gap reason="illegible" quantity="5" unit="character" precision="low"/> [ca. 5×]

  7. <gap reason="undefined" extent="unknown" unit="character"/> [...]

  8. <gap reason="undefined" quantity="5" unit="character"/> [5*]

  9. <gap reason="undefined" quantity="5" unit="character" precision="low"/> [ca. 5*]

  10. when @unit="line" Your proposition is fine with me. One remark @ "if the <gap> is not empty and contains a <certainty/> element, add "possibly" to the text, e.g. [ca. 2 lines possibly lost]". On what bears the certainty? (1) The number of lines lost or (2) the existence of the lacuna? The display "[ca. 2 lines possibly lost]" seems to correspond to (2).

  11. when <gap> (with a @unit other than "component") is within <seg> with @met Fine with me. And OK for not reintroducing @reason

  12. fusing sets of brackets. I would say option D (fuse brackets always). But I must admit that I do not see clearly what B and C imply.

  13. <gap> with @unit="component" I would like to have square brackets here to Thus: [⏑], [-], [⏓], [ ] OR [.] Notes: [⏓] might be confused with short or long syllable. I like [.] instead of [ ], and would thus generalise it when no @met Thus: [.] (whatever the component concerned; the reader will understand if it is a vowel part or a consonant and will refer to the XML for more details) except when there is @met, in which case [⏑] or [-].

danbalogh commented 4 years ago

Thanks, Manu. Some responses/clarification:

  1. <certainty> bears on (2). For an estimated number of lines you use @precision on the <gap>. I believe hardly anyone will ever use <certainty>, but the facility is described in the EpiDoc guidelines and back last summer, Arlo thought it a good idea to include it in our guide just in case.
  2. I listed the earlier options because Arlo mentioned that he would not want to include supplied text in the same set of brackets and I wanted to be clear about exactly what should or should not be fused.
  3. If we want square brackets, then none of [⏑], [-], [⏓] may be used for lost vowel, since all of those would be identical-looking to one syllable lost. So either we go without brackets: ⏑, -, ⏓ for lost vowel and * for lost consonant; or, if we want the brackets, then [.] for all lost segments, regardless of whether it is a short vowel, a long one, a vowel of unknown length, or a consonant. Or, if we want the distinction between vowels and consonants (and/or between vowels of short/long/unknown length), then we can do some more brainstorming (or do it later). E.g. how about [C] for lost consonant and [V], [V̄] and [V̆] for lost unknown/long/short vowel?
AnneSchmiedchen commented 4 years ago
  1. Just a very minor comment on this point. I appreciate that you, Dániel come up with suggestions all the time. But I must say that I would not be in favour of "[C] for lost consonant and [V], [V̄] and [V̆]". I find this less readable.
manufrancis commented 4 years ago

Thanks, Dan! @ 10. <certainty> . Thanks for the clarification. 10 as you propose is fine with me. @ 12. Noted. Let us see what Arlo had in mind. @ 13. Noted. Thus I vote for [.] for all lost segments, regardless of whether it is a short vowel, a long one, a vowel of unknown length, or a consonant.

manufrancis commented 4 years ago

Further considerations:

  1. fusing sets of brackets. I am no more in favour of option D (fuse brackets always). "[...]" (if retained for any display) should never be fused, so as to avoid, e.g. [ca. 5+ ...] (better retain [ca. 5+] [...]).
  2. <gap> with @Unit="component" I still vote for [.] for all lost segments, but like also Daniel's proposition: "[C] for lost consonant and [V], [V̄] and [V̆]". In any case I am in favour of using square brackets.
danbalogh commented 4 years ago

It seems that [.] will be the best for lost segments, if Annette is set against C and V with markers. They will be rarely encoded, anyway, and . is traditional for them. I don't see a problem with [ca. 5+ ...] but I have no objection to separating brackets for extent="unknown". Manu, would you then also want to separate brackets for the segment notation [.]? But the main question about fusing is whether supplied text should be in the same set of brackets as lacunae, or separate from them.

manufrancis commented 4 years ago

OK, let us go with: [.] for <gap> with @Unit="component" (lost segments) [...] when @extent="unknown" These should never merge with any other similar brackets closeby.

As for the main question about fusing: "whether supplied text should be in the same set of brackets as lacunae, or separate from them". I would say separate.

danbalogh commented 4 years ago

That is fine by me. Arlo, could you speak up if this corresponds to your ideas? Annette, I assume that apart from the C/V notation, which we are now inclined to reject, you are OK with the system described above?

arlogriffiths commented 4 years ago

I will try to answer in the course of the day. A.

Le 4 juin 2020 à 10:05, Dániel Balogh notifications@github.com<mailto:notifications@github.com> a écrit :

That is fine by me. Arlo, could you speak up if this corresponds to your ideas? Annette, I assume that apart from the C/V notation, which we are now inclined to reject, you are OK with the system described above?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/project-documentation/issues/19#issuecomment-638682090, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGMAE7GGQ77RJLH6AIOB7LRU5IVNANCNFSM4LUCZ52Q.

danbalogh commented 4 years ago

Arlo, no particular hurry about this one. Meanwhile, Annette says:

As I am facing a Github problem right now, I am answering directly to you: Yes, I am OK with the system as it has been described now.

arlogriffiths commented 4 years ago

Sorry for having let this slip for so long.

In diagonally re-reading the thread, I appreciated Manu's comment "When extent="unknown", the display (whathever the reason) = “[...]”. The info for the reason of gap is in the XML. It is too much detail for me to display. The interested reader will check the XML." I thinbk this applies also to other situations where we may want to avoid overkill in use differentiated symboks and modes of display. I am thinking of #54, where @ajaniak as asked us to settle the issues concerning <gap>.

I am too much out of it to have any opinion at this stage, and am certain I can live with the upshot of the exchange under the present issue mainly led by Dan and Manu from 19/05/2020 onward. @danbalogh: could you recap one list time, so @ajaniak can get to work?

danbalogh commented 4 years ago

OK, here's take n. It's actually pretty close to finished. Let us have a final yea or nay from all the PIs if possible. If any of you object to any of the solutions below, I would prefer that you A) suggested alternatives present in my previous list above, not brand new ones; and B) made sure that if you suggest a different solution for one of these, you check that the same modification of the method can be implemented in all related cases whilst not interfering with unrelated cases, so that the system as a whole remains coherent.

  1. <gap extent="unknown" unit="character"/> with any value of @reason
    • [...]
  2. <gap reason="lost" quantity="5" unit="character"/>
    • [5+]
  3. <gap reason="lost" quantity="5" unit="character" precision="low"/>
    • [ca. 5+] (but see my comment below on fusion, perhaps revert to [?5+])
  4. <gap reason="illegible" quantity="5" unit="character"/>
    • [5×]
  5. <gap reason="illegible" quantity="5" unit="character" precision="low"/>
    • [ca. 5×] (but see my comment below on fusion, perhaps revert to [?5×])
  6. <gap reason="undefined" quantity="5" unit="character"/>
    • [5*]
  7. <gap reason="undefined" quantity="5" unit="character" precision="low"/>
    • [ca. 5] (but see my comment below on fusion, perhaps revert to [?5])
  8. when @unit="line", display as text in square brackets:
    • <gap reason="lost" quantity="1" unit="line" precision="low"/> ---> [ca. 1 line lost]
    • <gap reason="lost" quantity="2" unit="line" precision="low"/> ---> [ca. 2 lines lost]
    • <gap reason="lost" extent="unknown" unit="line"/> ---> [unknown number of lines lost]
    • same, mutatis mutandis, for @reason="illegible"
    • for @reason="undefined" I suggest the text "lost or illegible" added after the number of lines
    • if the <gap> is not empty and contains a <certainty/> element, add "possibly" to the text, e.g. [ca. 2 lines possibly lost]
    • note that as per the EG, we will never have an exact number of lines lost: if the number is known precisely, then the encoding is with iterated <lb/> followed by an inline lacuna; however, an exact number of lines lost may in principle be encoded with <certainty> inside (e.g. "3 lines possibly lost"). It may be a good idea to generate an error message if a <gap> has @quantity (any value) and @unit="line" but neither has @precision nor contains <certainty/>
  9. when <gap> (with a @unit other than "component") is enclosed in <seg> with @met, then instead of the above, display the value of @met converted to prosodic notation, in square brackets
    • e.g. <seg met="+-+"><gap reason="lost" quantity="3" unit="character" /></seg> displayed as [–⏑–]
    • no distinction by @reason in this case
  10. <gap> with @unit="component": always display as [.] regardless of @reason and any other factors
    • the notation [.] will be understood to mean "one vowel, consonant or conjunct consonant lost or illegible"
    • note that this will only occur inside <seg type="component"> (generate error message if it occurs anywhere else?)
    • the enclosing <seg> may or may not have @met, and unlike 9 above, the presence of @met shall not affect the display of this <gap>

I'm following up with another comment on the fusing of brackets, which is still a tangle.

danbalogh commented 4 years ago

So: fusing sets of brackets - we need to think about this. Initially we had consensus. Arlo (27 March) said "yes, definitely, they need to be collapsed into a single set of brackets." Manu (27 March) said "I am fine" [with this] and Annette (28 March) said "I also agree to the points summarised by Manu". Then it seems I stirred up the soup by claiming (21 May) that "Arlo mentioned that he would not want to include supplied text in the same set of brackets". Frankly, I see no such statement by Arlo in this thread; he may have said that over Skype to me, or it may be a figment of my imagination, for which I apologise. In any case, hearing this, Manu (3 June) retracted his earlier opinion and added that [...] (for a gap of unknown length) should never be fused, then slightly later added that [.] (for sub-akṣara sized gaps) should also not be fused to anything else. So it seems that the idea of not fusing supplied to illegible may have been instigated by me, but the question of whether we want to fuse [...] and [.] to anything else is still a question. Thus, to illustrate with a hypothetical case ad absurdum:

śā<supplied reason="illegible">rdū</supplied><supplied reason="lost">l</supplied><seg type="component" subtype="vowel"><gap reason="lost" quantity="1" unit="component"/></seg><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character" precision="low"/><gap reason="illegible" quantity="3" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><supplied reason="lost">brā</supplied>hmaṇasya What display do we prefer?

  1. śā[rdū][l][.][1×][ca. 2+][3×][...][brā]hmaṇasya - no fusion anywhere. Looks pretty ugly to me.
  2. śā[rdūl][.][1×][ca. 2+][3×][...][brā]hmaṇasya - fused different @reasons of <supplied> ("rdūl") but nothing else
  3. śā[rdūl][. 1× ca. 2+ 3× ... brā]hmaṇasya - fused different @reasons of <supplied> and all kinds of <gap>, but not these two to each other
  4. śā[rdūl. 1× ca. 2+ 3× ... brā]hmaṇasya - fused everything
  5. śā[rdūl][.][1× ca. 2+ 3×][...][brā]hmaṇasya - fused everything except [.] and [...] ...or something else?

My preference definitely seems to be for 4, i.e. to fuse everything (provided that we don't make a distinction in the display of <supplied> depending on @reason), except that the notation with "ca." looks bad when displayed like this. So how about reverting, after all, to my earlier suggestion of [?5+] instead of [ca. 5+] (and likewise for illegible and undefined)? That would give us the display śā[rdūl. 1× ?2+ 3× ... brā]hmaṇasya, which I think is as clear as we can get. Notice that in the fused display I'm using spaces in place of the brackets, which should definitely be done to keep the items separate. However, if we do fuse [.] with other stuff in brackets, then this should not be spaced when it is next to a <supplied> element, only when it is next to <gap> (to avoid the display śā[rdūl . 1× ?2+ 3× ... brā]hmaṇasya). If you are dead set against the ?, then perhaps the "ca." could be shown without spacing to keep the items together: śā[rdūl. 1× ca.2+ 3× ... brā], but this doesn't look very good to me. And of course, the entire bracket fusion issue rests on whether Axelle can do the wizardry required to implement it. It may be best, for the time being, to forget about fusion altogether and just stick to śā[rdū][l][.][1×][ca. 2+][3×][...][brā]hmaṇasya in our transformations, and to come back to it when we come to website display.

arlogriffiths commented 4 years ago

Thanks Dan. I am in agreement with all your proposals in these last two messages, including preference for number 4 (fusing all brackets) and reverting to "?" to reflect @precision="low".

danbalogh commented 4 years ago

Thanks, that is great. I understand Manu is on holiday; can we expect him to give a final OK nonetheless or should we assume that he'll accept whatever we agree on? @AnneSchmiedchen - please confirm if you agree. I'll be happy to write one more recap to include all these things, but I'd like to wait until we have consensus before I do that.

AnneSchmiedchen commented 4 years ago

I agree. And many thanks for all this.

manufrancis commented 4 years ago

I agree (provisionally ;-) with Dan's recap fo <gap>. (I am not in favour of the distinction×, +, * and prefer to let the user check the XML rather than to obscure the edition; I might put this again on the table later). As for the fusion of brackets, I also agree provisionally. Thus option 4 śā[rdūl. 1× ca. 2+ 3× ... brā]hmaṇasya and using ? instead of ca. is OK for me for the moment. Again, I might put this again on the table later. So, Axelle, could you, please, implement the final recap that Dan will prepare?

danbalogh commented 4 years ago

Just one more thing. Given that we seem to be going for simplifying the display in other areas as well (e.g. space), I'm perfectly OK with using just a + sign instead of × and * for gaps of all reasons. We've already discarded distinction by reason in supplied text, so why not? If @arlogriffiths and @AnneSchmiedchen agree to that, then Manu will not need to put this on the table later and we can just go ahead with the way he prefers.

danbalogh commented 4 years ago

Oops, slight problem with this. What do we do if two different kinds of gap are next to each other, as in my example above? For things like <gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character" precision="low"/><gap reason="illegible" quantity="3" unit="character"/> do we want to display [1+ ?2+ 3+ ...] If not, then what, given that one of the items is imprecise but the other two are. But even then, if we don't have the imprecision, as in <gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/><gap reason="illegible" quantity="3" unit="character"/>, we would then probably want [6+] instead of [1+ 2+ 3+]. Perhaps best to stick to what I have above and indeed, put this on the table later if at all.

AnneSchmiedchen commented 4 years ago

I agree with Manu's (repeated) intervention to use just a + sign for gaps of all reasons. And I am still for the fusion of brackets and for the use of a ? sign instead of "ca." If we do not have any imprecision, I would also prefer [6+] instead of [1+ 2+ 3+].

ajaniak commented 4 years ago

For now, the above has been applied. Please note that I had to deleted any html elements used to structure the gap in order to allow the fusion (remember that it will work with <supplied>only with the @reason="lost" and @reason="subaudible". Same as for fusion of the <unclear>, the grantha rendering will mostly mess it up). I also had to delete the provision made by Epidoc that if two <gap> with the same @reason are after each other only the first one is displayed. I will wait your final decisions on the others matter of discussion.

danbalogh commented 4 years ago

To sum up the issues that remain, before I write a final recap:

  1. distinguish @reason in gaps of known/estimated size or not? Manu and Annette are for not doing so, I'm OK with it if we can solve the issue of merging display of side-by-side gaps (e.g. 3 characters lost and 3 characters illegible should not display as [3 3] but as [6]); @arlogriffiths has not yet expressed an opinion.

depending on the outcome of that,

A. if we keep the distinction by @reason, then fusion can be implemented between all things displayed in square brackets with a little trickery concerning spaces, which I'll summarise again if we come to this decision. Everybody has said they prefer full fusion (i.e. śā[rdūl. 1× ca. 2+ 3× ... brā]hmaṇasya), so we have consensus here IF the distinction by reason is kept. B. if we discard distinction by @reason, we need to know if it is technically possible to add the @quantity numbers of successive gaps for display, and we need to come to a solution for cases where some of those successive gaps have @precision="low": do we then add the quantites and put a ? before the sum, or do we keep the numbers separate, with ? only in front of the imprecise ones?

And one more issue that @ajaniak has just reminded me of by the above post. None of the above details mention <supplied reason="subaudible">, which we now use for two things: editorial avagrahas and editorial punctuation. I think it would be best to display these differently from the restoration of lacunae, so I suggest that they should not be in square brackets. Instead, they could be displayed in the same way as <supplied reason="omitted">, and perhaps also merged into the same set of brackets as any adjacent restored omissions.

Finally, a note to @ajaniak : none of our files should have two <gap> elements with the same @reason after each other. If in such a case the standard EpiDoc transformation silently displays only the first one, it's OK if you override that, but it would be best if an error message could also be generated in such a case. The only exception I can imagine is if one of these contains <certainty>, in a hypothetical situation where the rest of an inscription is broken off before the end of a stanza. In that case you would know that the inscription quite certainly contained as many characters as needed to finish the stanza, but you may be unsure whether or not it contained any other text after that, so you would mark up a lacuna of known size as lost, followed by a lacuna of unknown size as possibly lost.