erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

display of <g>.</g> #78

Closed arlogriffiths closed 5 months ago

arlogriffiths commented 4 years ago

I am not so satisfied (aesthetically) with the combined effect of

  1. asking encoders not to insert any space before punctuation signs (NB: I actually don't find a very explicit rule to this effect in EGD, but TG p. 25 does state about punctuation that "all of the following shorthand characters should be followed by a space in transliteration, but not preceded by one")
  2. displaying <g>.</g> as | in a font (different from the one you see here) where the vertical bar does not extend below baseline

Example code:

śaka-varṣātīta <num value="728">728</num> māgha-māsa navami śukla-pakṣa<g type="circle">.</g> <abbr>ha</abbr><g type="circle">.</g> <abbr>U</abbr><g type="circle">.</g> <abbr>vr̥</abbr><g type="circle">.</g> vāra<g type="circle">.</g> tatkāla rakai patapān· pu manuku<g type="circle">.</g> sumusu<supplied reason="lost">k ika</supplied>

Example display:

Capture d’écran 2020-09-25 à 07 50 03

As the vertical bar is shortish in the given font, without any descender below the base line, and as it also sticks directly to the preceding letter, it becomes hard to see that it's a non-alphabetic character.

Suggestions:

danbalogh commented 4 years ago

To my mind, what we classify as punctuation (EGD §4.2.4, "employed in the original for syntactic or metrical segmentation into relatively small units, similar in function to a modern comma, full stop, question mark, exclamation mark, colon or semicolon") should not be preceded by a space, since no space precedes the equivalent marks in English. This is not an encoding issue but one of style, but I can make it explicit in the EGD if you want me to. Or, if everyone else prefers to have spaces before punctuation marks, I can live with that, and indeed, in that case it would be best for our transformation to automatically add a space in display before such marks unless one is already present, in which case encoders would not have to worry about whether they have used their spaces correctly. So, on your first point, whatever you prefer. On your second point, vertical bars are different on my screen. In the HTML file I get from a transformation, I do not see a font specified, so I believe the font in which you get to see your transformed texts depends on the defaults set for your browser, which you should be able to change easily. My Firefox is set by default to use Times New Roman, and my vertical bars are thinner than in your screenshot and extend further downward.

Clip

Unless you want Axelle to add styling involving a specifically designated font to our HTML transformations, which I think is not worth the bother, you may want to set a different default font for your browser.

arlogriffiths commented 3 years ago

I was basically satisfied with @danbalogh 's answer of 25/09. (Yes, @danbalogh, I would like you to make it explicit in EGD that we don't expect and editorial space before punctuation while we do before symbols.)

Presently, however, something seems to be going wrong with display of <g>.</g>. Rather than being displayed as vertical bar, we get the following (from my file Kurungan.xml):

<lb n="9"/>pu Aṅgada<g type="circleMed">.</g> pu plī<g type="circleMed">.</g> pu dhanada<g type="circleMed">.</g> pu taṁtaṁ<g type="circleMed">.</g> pu gaccha<g type="circleMed">.</g> pu gadhī<unclear><g type="circleMed">.</g></unclear> pu māgha<unclear><g type="circleMed">.</g></unclear> pu gusay·<g type="circleMed">.</g> pu samvok·<g type="circleMed">.</g> nāhan· sira rāma Umeḥhakan·nikāṁ savaḥ śīma ḍaṁ ā<lb break="no" n="10"/>cāryya munīndra<g type="circleMed">.</g> huvus śuddha-pariśuddha<g type="circleMed">.</g></p> <p>tatra sākṣī<g type="circleMed">.</g> bhagavanta puccha<g type="circleMed">.</g> punta kamala<g type="circleMed">.</g> punta s<unclear>u</unclear>kha<g type="circleMed">.</g> punta cvat· saṅke kataṅgaran·<g type="circleHigh">.</g> rake praṣ· punta

Capture d’écran 2020-11-19 à 18 29 14

I am not used to seeing so many transformation problems.

@ajaniak : can you identify the source of this trouble, and help me solve it?

danbalogh commented 3 years ago

There was in fact already an indication of this in EGD §8.1.2 (editorial space and markup), but I've now made that more explicit and verbose, and added a cross-reference and a summary under §4.2.1 (symbols overview). I may one day create a "style" appendix to the EGD and move some or all issues of this type over there.

For the display of symbols, I think we have made some changes to the preferred encoding since we last discussed display in #50 , so it is no wonder if display is not exactly as accepted. Given your comment above, I think at the present stage display should be handled like this:

Is this in accordance with what you, @arlogriffiths , had in mind? Note in any case that this is not meant to be a long-term solution. What we've been working toward in the Taxonomy and the EGD revision associated with the creation of the Taxonomy was that the display of <g> should, in the long run, ignore the contents and instead display various basic symbols on the basis of @type. We have not made display plans for that strategy and the common values of type have not been finalised, so display cannot yet be implemented that way. We will also need to think about whether it would be possible to make the contents (. signifying punctuation and § signifying space filler) affect the display. It may be possible e.g. to display punctuation marks without a cartouche, misc symbols in a rounded cartouche as now, and space fillers in a different-shaped cartouche.

@arlogriffiths , please confirm you agree with the above display suggestion and if yes, we can then ask Axelle to implement that.

arlogriffiths commented 3 years ago

I confirm that I agree, and would be grateful if Axelle could implement this.

arlogriffiths commented 3 years ago

I am reopening this just for aesthetic fine-tuning, because I have now seen that current display is confusing at least for some users (on Mac?) who are not looking at our code and are not yet attuned to our convention of displaying any punctuation sign with vertical bar.

At my encouragement, @salomepichon showed to @dominiquesoutif her html for DHARMA_INSCIK00831.xml.

To this bit of display:

Capture d’écran 2021-02-12 à 06 14 28

Dominique responded like this:

« Par ailleurs, je ne comprends pas trop la lecture jhel à la ligne suivante (plutôt jhe ◎, comme le notait Cœdès non ?). »

As @danbalogh showed in a previous message in this thread, this seems to be partly a problem related to default fonts set in our browsers, because the vertical bar looks too much like a letter l (ell) on some computers in the current settings.

We need to avoid any users ever being able to get the wrong impression that what is jhe<g type="circleConcentric ">.</g> in the xml code is jhel rather than jhe| in display, regardless of the settings on their computers.

To further complicate things, in the same context there are also cases of <g type="numeral">I</g> in Salomé's code, with a capital letter i expressing a numeral (and which @salomepichon has so far forgotten to wrap in <num value="1">).

So we need a display solution for <g>.</g> that cannot be confused either with letter l (ell) or with capital letter i.

Thoughts?

danbalogh commented 3 years ago

Before we can implement displaying things like jhe ◎, we must make some progress with the symbol taxonomy. Since the last time I devoted serious attention to that matter, I have increasingly had the feeling that it should be kept as simple as possible, because the actual variety of symbols is far too complex, and encoding it in detail does not seem to serve a useful purpose. But no more on that here - when Arlo can find the time, the two of us should take the matter up again. Here are some thoughts on what we might do till then.

First of all, we must keep in mind that whatever symbol taxonomy and associated display we finally settle on, it will almost certainly involve the | vertical bar character for the display of daṇḍas, even if not for other symbols. At least in those cases, people who display a text in a font where the | glyph is not sufficiently different from l (ell) and/or I (capital i) will inevitably encounter some ambiguity, and will just have to live with it. I do not think it is our job to prevent that from happening, just as it is not our job to make sure people will correctly distinguish a 0 (zero) from an O (capital o) or a B from a ß (scharfes ess). This is not, of course, to say that we should not make steps - within reason - to minimize the chances of this ambiguity occurring. One way to do that directly would be to use the broken bar ¦ character instead of |, but I am averse to that, since daṇḍas are always represented by an unbroken bar. Another may be to include a formatting instruction in the HTML transformation, so that vertical bars are set to display in a particular font - but this is even less practicable, since in addition to making the transformation and the HTML that much more complex, it will still be dependent on what fonts the user has installed (or whether or not he can access online fonts), and the formatting may still disappear when copy-pasting the text into another application.

Given the above, if you are sure that we need to reduce the chances of this ambiguity occurring, then the only plausible thing we can do is to reduce the number of times the | character appears in our displays.

A. This could be achieved (temporarily, until we get to implementing the display of ◎ and so forth based on the type of g) by revoking our (likewise temporary) rule of displaying | and || if <g> contains a . (full stop) or .. (two full stops). Instead, we could display the value of @type in a cartouche in this case, just as we do for empty <g/> elements. This is not ideal, since if someone copies the text from the HTML and pastes it into email or a word processor, the cartouche disappears, so you would get "jhecircleConcentric". I can live with that, knowing it is temporary, but it may be just as bothersome for Dominique and others. We could circumvent that by making the display transformation more complex and using e.g. angle brackets instead of (or in addition to) the cartouche around symbol tokens, e.g. "jhe⟨circleConcentric⟩".

B. A second possible solution would be to simply display "." and "..", i.e. the contents of <g> in such cases. This is again not ideal, because we are not used to seeing a "." in an Indic (nor, I assume, SEA) text, much less to seeing "..". But at least the full stop cannot be mistaken for an alphabetic letter and it's fairly intuitive. Also, I've just checked and ".." enclosed in <g> is not actually mentioned at all in the EG (only ".." without <g> when encoding a previous edition that uses two levels of punctuation), and I may be the only one who has ever used ".." enclosed in <g>, which I'm happy to change to ".".

Either of the above seems acceptable to me: displaying symbol type with angle brackets, or displaying the contents of the <g/>. The latter is perhaps simpler, since it can be formulated for the transformation in the same way as we've formulated the display of space fillers: if <g> contains one or more . characters, display those characters and ignore the @type of <g>; the rest of the rules for the transformation of <g> can then remain unchanged.

manufrancis commented 3 years ago

I do endorse Daniel's opinion that

I do not think it is our job to prevent that from happening, just as it is not our job to make sure people will correctly distinguish a 0 (zero) from an O (capital o) or a B from a ß (scharfes ess).

ajaniak commented 3 years ago

So far, your html has no font associated - I don't wish to add one to avoid issue that will arise because they are outside any system)- it means the output use the default parameters of your web navigator. The example above is in times, you could select a font less ambiguous for the | character, if you wish.