schema too restrictive regarding <lb>

arlogriffiths commented 4 years ago

@ajaniak EGD 9.1.6 states: "to reduce code clutter, feel free to use these elements without any attributes, since the purpose of including them in lemmas and readings is only to show the fact that such a transition is present (or was indicated as present, not necessarily always in the right place, by a previous edition)"

However, the schema currently invalidated code which only has <lb/>:

<app loc="12">
                     <lem>-valaja-ma<lb/>laṅga-</lem>
                     <rdg source="bib:Bhattacharya1983-84_01">-vana-kuma<lb/>laṅga-</rdg>
                  </app>

The error message is: element "lb" missing required attribute "n"

Can you make the schema less restrictive on this point?

@danbalogh and @ryosukefurui : FYI

danbalogh commented 4 years ago

This was a surprise to me, since I've had no error messages for <lb/> before. I now see that we have a new inscription template which calls on our own schema. This looks good, and you have probably considered and discussed the issues involved here, but I was out of touch. My main concern is: are we going to use a DHARMA schema instead of the EpiDoc schema? So far, I've always understood that we would remain EpiDoc compliant and use a schema of our own in addition to the EpiDoc one. But the new template links only to DHARMA files. So are we now formally creating a fork in EpiDoc, or are we going to reintegrate? My second question: is this new schema ready for testing? I've tried it on one of my files and got several error messages in addition to that concerning <lb/>. One of these was on the related issue of <pb/>, which should be permitted in an apparatus; there were at least two more. Should I open bug tickets for those and test the schema on some more encoded texts, or shall that come later?

ajaniak commented 4 years ago

The template v03 and the schema are not ready to be used and not even tested. I am still working on most of the schematron codes.

@arlogriffiths, I expect the @n on the <lb/> in order to display that the entry is on line 12 and 13 (per your example). In those case, the second number is provided by the content of the @n attribute. At least, it is what I understood of our discussion on the subject. Let me know, if I need to change the XSLT code.

@danbalogh the schema DHARMA will be linked to the Epidoc schema. Chaining is expressed in the ODD with an attribute @source. But if we keep the Epidoc schema, requested features such as dropdown menu to help sorting the attribute values won't be provided. The current Epidoc schema is too lenient comparing to the encoding model established by the EGD.

danbalogh commented 4 years ago

OK, thanks.

arlogriffiths commented 4 years ago

@ajaniak : my understanding of "without any attributes" was that @n could also be avoided. (@danbalogh : wasn't that our intention?) Are you not able to get the value of the ending line number by taking the value of @loc and doing +1 for every extra <lb> that occurs in the <lem>? So, in the example at hand, you would get 12–13 in display not because there is an explicit "13" in the code but by doing 12+1 = 13: is that possible?

danbalogh commented 4 years ago

Indeed, our intention in EGD 9.1.6 was precisely to permit (and encourage) the omission of @n in an <lb/> (and likewise, <pb/> and <milestone/>) in order to reduce the code clutter in a lemma or reading. If it is possible to display "12-13" by the calculation suggested by Arlo above, that would be ideal. If, however, that is complicated to achieve, I would be perfectly happy with either of the following:

displaying just "12" for the apparatus entry (this, in fact, was our intention when Arlo and I discussed the EG, and this is the first I hear about the idea of displaying "12-13")
rewriting the EGD to make @n mandatory in the apparatus

ajaniak commented 4 years ago

I can create a code to create the second number, but it takes longer to process. But I still need the @break to be kept in the <lb/> for each reading and lemma for display purposes or it implies to write the expected display for the empty elements according to the spacing rules for /.

Just so you know most of the files don't have a declutter[ed] <lb/>. So if you decide to apply this rule, you need to make sure this practice is understood. And it might require for some of them to change their habit to just copy-paste.

danbalogh commented 4 years ago

Let me know what you decide. My tacit understanding was that all line breaks and page breaks would simply be displayed as a / sign in a lemma/reading, with no distinction on the basis of @break. I'm not sure I understand "most of the files don't have a declutter". If you mean that many encoders have retained these attributes in an apparatus, I don't see that as a problem. The processing should simply ignore those attributes and display <lb/> and <pb/> and <milestone type="pagelike"/> indiscriminately as a / sign in all circumstances. Omitting the attributes in the apparatus is not a rule but an option.

arlogriffiths commented 4 years ago

@ajaniak : as written by @danbalogh , we were not actually counting on have the final line number of a range indicated in display, so please help me with some more information before I decide to ask you to write it or do without it.

how easy/difficult it is to write that code?
how significant will the increase of processing time be?
at what stage of our workflow would that increased time become noticeable? (it is something we would suffer from more once we work in eXist-DB than as long as we depend on XLST transformation xml>html?)

Is Dan's response that we expect all line breaks and page breaks to be displayed as a / sign in a lemma/reading, with no distinction on the basis of @break, sufficient for you to set up the other aspects of the display?

manufrancis commented 4 years ago

Some considerations after discussing with Axelle:

As for the display of <lem> and <rdg>, I find important to be able to display <lb> differently depending on the presence of break="no", that is: “ / ” [slash with space before and after] when the <lb> is between words. “/” [slash without space before and after] when the <lb> is in-word. [side-note: I foresee a similar display in the Edition (curated display): space before and after the display except when it has the @break="no"]

E.g.

with_attribute_break=no

<lem><unclear>-ka</unclear><lb n="5" break="no"/><unclear>ḻa</unclear>ñ<unclear>c</unclear>i<unclear>ṉ-u</unclear>ḷḷum</lem>

and NOT

without_attribute_break=no

<lem><unclear>-ka</unclear><lb/><unclear>ḻa</unclear>ñ<unclear>c</unclear>i<unclear>ṉ-u</unclear>ḷḷum</lem>

Thus retaining @break="no" in <lem> and <rdg> is desirable, as only its presence would enable the desired display.

As for the display of multiple lines of a <lem> when the @loc of <app> has only one line number, what Axelle has implemented works when 2 lines only are concerned. It might be very possible that we will come across <lem> on 3 lines (in case of inscriptions with very short lines). Axelle could also fix the display of it, but it seems more simple and straightforward to keep the @n in <lb> in <lem> for an easier transformation towards the desired display.

I thus suggest a rewriting of EGD §9.1.6 with minimally setting a mandatory rule to keep in <lem> and <rdg> the <lb> with all its attributes. We are even in favour of a mandatory rule to keep in <lem> and <rdg> all the markup present in the Edition, with no more optional rule of using the element without attributes. Because, -it seems simpler when creating an <app> to replicate (copy/paste) the content on the <lem> in the <rdg>, and then adapt the <rdg> only for the specifics of the concerned editor’s reading (without bothering about the attributes that could be deleted or not). -we find also that for our re-users having full markup in <lem> and <rdg> is better than not having it (the more so, if it is present as imported through a copy/paste from the Edition). -it seems better to have a mandatory rule (keep all markup) followed by everybody in the project rather than an optional rule (delete markup if you want), in order to avoid, as much as possible, various encodings.

danbalogh commented 4 years ago

I find Manu's arguments convincing and have no objection to removing the option of omitting attributes in these cases. @manufrancis : I don't understand the difference between your minimum suggestion and the next one - they seem to be the same to me. Am I missing something? @arlogriffiths : do you accept mandatorily retaining the attributes of these milestone-type elements in the apparatus?

arlogriffiths commented 4 years ago

no objection from my side

Le 4 août 2020 à 10:09, Dániel Balogh notifications@github.com<mailto:notifications@github.com> a écrit :

I find Manu's arguments convincing and have no objection to removing the option of omitting attributes in these cases. @manufrancishttps://github.com/manufrancis : I don't understand the difference between your minimum suggestion and the next one - they seem to be the same to me. Am I missing something? @arlogriffithshttps://github.com/arlogriffiths : do you accept mandatorily retaining the attributes of these milestone-type elements in the apparatus?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/erc-dharma/tfb-bengalcharters-epigraphy/issues/3#issuecomment-668450299, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAGMAEYNB7EUTGYE4CKMGODR6665PANCNFSM4PKRWB4A.

danbalogh commented 4 years ago

A question to all of you. Suppose the lemma includes a page break within a word, e.g. suva<pb n="2v" break="no"/><lb n="15" break="no"/>rṇṇaṁ As we agreed last summer, in such a case @break="no" is to be used on both <pb/> and <lb/>. By the above display suggestion, this would by default display as suva_/__/_rṇṇaṁ (with underscores here standing for spaces). I suppose you agree that we would want to display suva_//_rṇṇaṁ instead. How shall achieve that? The ideal solution would be to tweak the display transformation, so that when <pb/> and <lb/> are next to each other (in this order, with nothing except possibly white space) between them, and both have @break="no" (or, to accommodate human error, if either has @break="no"), then display _//_ instead of _/__/_. @ajaniak - is this feasible or would it be too complicated to implement? If it is not feasible, we need to think of a different way. Note that revising the EGD to remove the obligation of adding @break="no" to <pb/> when it is also present on an adjacent <lb/> would not fully solve our problem, since by default that would still display as suva/_/_rṇṇaṁ.

manufrancis commented 4 years ago

Dear Dan,

As for the difference between my minimum suggestion and the next one: Minimal suggestion: keep EGD as it is, except add "retain mandatorily @n and @break in <lem> and <rdg>". Maximal suggestion (our preferred option) : retain mandatorily full markup present in edition, as it is, in <lem> and <rdg>.

As for suva<pb n="2v" break="no"/><lb n="15" break="no"/>rṇṇaṁ my understanding is that it would be displayed as suva//rṇṇam

The first question would be: do we need/want also a display for <pb> in apparatus? Is not the display of <lb> enough?

In the case of kṛtam<pb n="2v"/><lb n="15"/>Idam my understanding is that it would be displayed as kṛtam // Idam (if we display both <pb> and <lb>) I let Axelle @ajaniak tell us if and how this is feasible, but I think it would be not that complicated.

danbalogh commented 4 years ago

Dear Manu, on the first point, I still don't understand the difference, since as best I can recall, all other markup except the @n and @break of a pb, lb and milestone must already be retained in lemmas and readings, except of course for block-level containers. (EGD §9.1.6).

On the second point, indeed, I messed things up. The problem I had in mind would be when a pb and lb are next to each other without, not with, @break - as you illustrate with kṛtam Idam. There, indeed, the default display according to the above would be kṛtam / / Idam, and as you say, we would prefer kṛtam // Idam. So @ajaniak : that is indeed what we'd like to know: is it feasible to implement this display? If not, I am not averse to Manu's suggestion of not displaying pb-s in the apparatus at all.

ajaniak commented 4 years ago

such a display is feasible. (Currently, only the <lb/> are taken into account in the display of the apparatus, the base code is mostly the same)

danbalogh commented 4 years ago

Great, thanks. I've already added a comment to the EGD that the option of discarding attributes is to be removed, and we can now be assured that if an <lb/> is next to a <pb/> (or to a <milestone type="pagelike"/>), then the extra spaces can be removed from the display.

manufrancis commented 4 years ago

Dear Dan, sorry, let me re-read EGD §9.1.6 and comment upon it.

danbalogh commented 4 years ago

Thanks, Manu, the difference between your two proposals is now clear to me. I'm definitely for the "maximal" one as the minimal one would complicate the encoder's job with an extra decision to make.

arlogriffiths commented 5 months ago

Can this issue be closed?

michaelnmmeyer commented 5 months ago

Closing this because #282 covers the same issue. #282 is still not implemented.

danbalogh commented 5 months ago

I concur. The EGD revision is long done, and the display is dealt with in https://github.com/erc-dharma/project-documentation/issues/282

erc-dharma / tfb-bengalcharters-epigraphy

schema too restrictive regarding <lb> #3