questions and requests regarding display of inscriptions

erc-dharma / project-documentation

DHARMA Project Documentation

Creative Commons Attribution 4.0 International

3 stars 3 forks source link

questions and requests regarding display of inscriptions #250

Closed arlogriffiths closed 4 months ago

arlogriffiths commented 9 months ago

examples taken from https://dharman.in/display/DHARMA_INSPallava00004 https://dharman.in/display/DHARMA_INSCIK00334 https://dharman.in/display/DHARMA_INSIDENKTigaRon-TluRon

Metadata What is the use of indications like "Languages: eng, fra, pra, san." — especially if, as is the case here, we see a mix of original language and meta-languages being listed?

Editorial conventions and display modes The editorial conventions previously set up by Axelle and represented at https://erc-dharma.github.io/editorial were addressing a situation where we did not yet have a separation between Logical and Physical display. Now that we do have such a separation, and we are still using color codes while we now also have explanations in mouse-over, I am not sure it is also necessary to maintain brackets like ⟨...⟩, ¿...?, etc., which are in most cases highly particular to DHARMA (as opposed to general scholarly conventions). @danbalogh : what do you think? Can we simplify the redundancy of representation, or do we want to keep it for copy/pasting to print context? But in that case, I'd also like to be able to keep the mixed representation which shows both the diplomatic and the curated state of the text in a single edition. @michaelnmmeyer : could we offer another display mode which retains display like svadat¿a?⟨ā⟩ṃ for ease of copying to a print context?

Display of bibliographic data The display of bibliography at DHARMAN.IN involves some differences from the previously agreed upon conventions. Compare

<bibl rend="omitname"><ptr target="bib:Soutif2009_03"/><citedRange unit="page">507</citedRange></bibl> This was displayed as (2009: page 507) and is now displayed as (2009, 507)
<bibl n="DS"><ptr target="bib:Soutif2009_03"/><citedRange unit="page">507</citedRange></bibl>
<bibl n="JFF"><ptr target="bib:Fleet1880_01"/><citedRange unit="page">100-102</citedRange></bibl>

My reaction on changes in bibliographic display:

The use of "p." and plain hyphens in cases like "IA 9, p. 96-103, 100-102" reflects French bibliographic conventions. Our database is English-based and should within reason attempt to follow anglophone norms. This means we need en-dash for intervals and should represent page ranges without using "p." (or "pp.") in such contexts. The previous display was better on this count.
I am also less happy with the way the page range within the volume is shown in the new display compared to the previous one. (See the Fleet item.)
Likewise, the previous display did better for the indication of page runs within articles that themselves have page runs
As for point 1, I feel that (2009: 507) would be the most conventional manner of displaying the information; at least use of colon would be more consistent with our use of the colon as separator in other bibliographic lists. For reasons I foget (@danbalogh : do you remember?) we opted for the less conventional inclusion of 'page' in the display set up by Axelle. Practially, the aim seemed to be for all values of @unit to be displayed explicitly, and the @unit is supposed to be page even when no @unit is explicitly encoded.
I am not happy with the absence of publication place in the Zotero entry leading to "no place" being inserted, because it feels stupid to have to fill in "Paris" as publication place when the university is "Université de Paris III". There are many theses recorded without publication place in Zotero for this reason.
I am happy with the new display of siglum.

Apparatus Our database needs to come closer to the design previously established by Axelle. Compare

My observations on the way apparatus is presently displayed:

there is no reason for page runs to appear in apparatus at all
I'd like to see the same use of light green color for <rdg>
I'd like to see the same more modestly sized diamond as separator

manufrancis commented 8 months ago

My view about:

Editorial conventions and display modes

I opt to maintain brackets like ⟨...⟩, ¿...?, etc., even though they are in most cases highly particular to DHARMA (as opposed to general scholarly conventions).
I want to keep them for copy/pasting to print context.
I concur with Arlo that having a mixed mixed representation which shows both the diplomatic and the curated state of the text in a single edition is useful. Thus Michaël, if you could offer a further display mode, besides "logical", "physical" which, retains display like svadat¿a?⟨ā⟩ṃ for ease of copying to a print context, that would be great.

danbalogh commented 8 months ago

Re display modes: I am strongly of the opinion that we do need a mixed/complete mode. Especially, but not only, for printed publications. Mouseover tips are nice, but for example if I hover over an emendation, it only tells me "corrected text" - i.e. it tells me what the markup means, but it doesn't tell me what the pre-correction text is. We certainly need some way where it is possible to see pre- and post-editorial versions simultaneously, and preferably without the need to keep two windows open or to hover above each instance that I wish to reveal. We could do this in the Physical view and rename the current Physical to Diplomatic. Or we could offer a combination of two switches for multiple view modes: Logical and Physical (which to my mind should mean that the text is displayed as paragraphs as stanzas versus laid out as in the original), both available in Diplomatic (as received), Curated (as emended) and Mixed. This may of course be too much of a complication. (By the way, I feel strange about the term "corrected text" - it makes me think primarily of a correction in the original, not an editorial emendation.)

Re editorial conventions: the general scholarly conventions are far from uniform; the Epigraphia Indica style is still quite prevalent, but no few publications in the last couple of decades have attempted to innovate. I assume that for SE Asian editions, the picture is similar, but of course I don't know. We are in a position where we may be able to innovate and make it stick, establishing a new standard convention that works better than the old. Also, many of the details in which the DHARMA convention diverges from the Sanskrit epigraphic tradition are in accordance with the Leiden markup which is pretty much the standard for classicists, but which I believe is also used more widely than the Classical Mediterranean region. Before implementing this iteration of DHARMA display, we might want to discuss the DHARMA print markup conventions once more, as I think we still have some inconsistencies. For the record, my suggestion is (and has been from the beginning) that our priorities should be on the one hand convergence with Leiden where we diverge from the EI (and other older Indic) convention, and on the other hand an internal consistency, so that brackets of a similar nature should be employed for conceptually similar things. One problem is of course the inconsiderate use of the underdot for unclear in the Leiden convention, which we obviously cannot adopt, which takes us to our straightforward solution of using rounded parentheses for unclear. (I wholeheartedly abhor the EI-style use of square brackets for unclear, since square brackets are quite universally used in fields outside epigraphy for editorial insertions.) I stop here, but hope to continue this at some point.

Re bibliographic display: I have no strong preferences in any direction and I'm happy to accept whatever Arlo and others propose. Still, here are some thoughts.

On the comma versus colon in (2009, 507), I might note that I'm nowadays using the comma in print publications, because Chicago style references, at least as rendered by Zotero, do so. I have not checked the actual Chicago manual; I think it may permit both the comma and the colon. Intuitively, the colon does make better sense to me.
I have no recollection why "page" was included in the display; in fact, no recollection at all that I was involved in a decision over this. Can it be that this is simply a decision of convenience? I.e. if @unit is present, then its value is shown? Or does "page" also appear in the display when the code is <bibl rend="omitname"><ptr target="bib:Soutif2009_03"/><citedRange>507</citedRange></bibl>?
About the mandatory inclusion of publication places, I am of two minds. On the whole, I think the entire practice of needing to list a publication place is outdated, quite useless in most cases, and problematic in many cases (when there is no publication place listed in the book, or when there are many publication places). However, if we opt to include the publication place in any reference, I think it should explicitly present in every reference, no matter how obvious it may be. I personally would not at all be surprised if "Université de Paris III" were located in a town outside Paris.

michaelnmmeyer commented 8 months ago

Metadata

I need to know what language information you want to be displayed, and how. For now, I just enumerate all the @xml:lang I find in the document. I can produce something like "Text in Tamil, with parts in Sanskrit. Translations in English and in French", or just display "Tamil", or something else.

The catalog search interface likewise looks for all languages used in each document. I can add new search fields like "text.lang" (main language of the text), "trans.lang" (main language of the translation), or something like that.

Editorial conventions and display modes

OK for a "full" display mode.

Display of bibliographic data

OK for following English conventions for page numbers (p., pp., en dash).

For bibliographic entries that specify a <citedRange unit="page">, I planned to ignore the page range from Zotero (if any) and only use the one specified in the XML document. Thus, for

<bibl n="JFF"><ptr target="bib:Fleet1880_01"/><citedRange unit="page">100-102</citedRange></bibl>

we would just have

[...]. IA, p. 100-102.

since the page range 96-103 from the Zotero entry is less specific and redundant. Would this be OK?

OK for using colons instead of commas in article entries (e.g. Fleet. "Hello." IA: p. 100-102) and in references (e.g. See Fleet (1880: p. 4)).

I would be in favour of always keeping an explicit p. (or pp.) for indicating page numbers, because references start to look weird when several units are used. See e.g. Fleet 1880: vol. 2, 4, n. 20, where the 4 that represents a page number looks like a typo. I could also omit the p./pp. in specific circumstances like: the page range is the first given <citedRange>, if you want.

OK for dropping No place.

michaelnmmeyer commented 8 months ago

Follow-up for remarks on the apparatus and for Dan's comment (which I noticed only after publishing mine)

Display modes

We will have tooltips that indicate what is the corrected text (in the physical display) and what is the original text (in the logical display). This is a feature Manu wants, too, but this is not implemented yet.

For the display modes, I propose to add a few toggles, like "Show/hide emendations", "Show/hide original text", etc. We have not talked about the apparatus display, by the way: should it also have several view modes, several display toggles, etc.?

OK for <corr>, I will replace "Corrected text" with "Emended text".

Bibliographic display

I will let you decide what to do.

Apparatus @arlogriffiths

The presence of page runs in the case you mention is a bug, thank you for noticing.

For the size of diamond, this is font-related. Axelle's display looks like this on my computer:

There is a smaller diamond MEDIUM WHITE DIAMOND in Unicode (◇ vs. ⬦), I will use it instead.

OK for coloring readings in green. I must point out, however, that we have several cases where colors overlap. In such situations, I can keep only one color, or combine them e.g. blue + red = violet, or follow some other convention.

danbalogh commented 8 months ago

Just a few points now in haste (I'll be able to contribute more in January, but I'm afraid not before). Having toggles in both/all views and having the tooltip display the alternative (pre- or post-emendation) text are both great, I'm very happy these will be implemented. The combined view would then be useful mainly for copy-pasting for other uses.

Colour-coding marked-up text: I don't think colour combinations should be used; let's not complicate the colours because nobody will be able to keep in mind what means what. So where items already associated with different colours overlap, we'll just have to live with displaying only one of the colours.

arlogriffiths commented 8 months ago

I rather strongly prefer colon (2009: 507), over comma (2009, 507), especially in cases where the reference is not encapsulated in parentheses, as confusion then more easily arises with normal sentence punctuation.

danbalogh commented 8 months ago

Apropos of displaying the languages in a file. I agree that listing a mix of languages is not useful. The smart display Michaël proposed would be much nicer.

BUT what concerns me is that we also have a number of fields in the metadata sheets for language and script. It is not clear to me why this information has to be recorded redundantly in the metadata since it is supposed to be encoded in the files. It seems to me that the metadata sheet can hold no information beyond what is encoded in the XML, whereas the encoded files do hold more information than the sheet, since the encoding also shows which parts of a text are in which language. So, if language information can be easily extracted from the XML files and presented to the end user in the smart form Michaël has suggested - and I gather that this is not a problem - then the redundant repetition of the same information should be cut out of the metadata sheets.

danbalogh commented 8 months ago

Re Display of bibliographic data: I'm happy to go with the colon. EDITED: That might also work in the case of complex references with multiple units, e.g. Fleet 1880: vol. 2: 4, n. 20, but I could also imagine displaying p/pp in such cases (while dropping it elsewhere) as Michaël has suggested above.

danbalogh commented 8 months ago

Returning to colour overlaps. I still definitely think that new colours must be avoided, so where there is a colour conflict, one must be picked over the other. As far as apparatus readings are concerned, I'm not at all sure the green colouring is really needed for readings and I would slightly prefer if @arlogriffiths changed his mind about that and accepted using the default colour instead. Reasons why: 1, the readings are already highlighted by italics and diamonds, so even though the colouring is more conspicuous than either, I don't think we really need a third way to pick out readings from the surrounding text; 2, the same (or a very similar) light green colour is used in the edition display for editorial correction, so (assuming that any end user pays attention to the colours) its use is confusing. That said, if Arlo insists on showing readings in green, then I can live with it. In that case, I suggest that the contents of <rdg> elements should normally be shown in green, but if any child of the <rdg> is associated with a colour (e.g. <sic> or whatever), then that colour should override the green. The function of highlighting the readings would still be served with multicoloured readings. Also, I believe this is simpler to implement than to force-inherit the green colour to all children of the <rdg>. But again, I can also live with the latter if that is Arlo's definite preference.

Turning now to possible colour conflicts within the edition (or in edition snippets in lemmas and readings and in other divs such as the commentary), I believe these are going to be rare. It seems to me that we presently have colours for <sic>, <corr>, <orig>, <reg> and nothing else - so the only instances of colour conflict would be when one of these is nested in another as in EGD §6.3.3. In this case too, I think the displayed colour should be determined by the lowest-level element in the hierarchy, so e.g. a <sic> within <orig> would be coloured red like any other <sic>, while the rest of the contents of the <orig> tag would be magenta like any other <orig>.

michaelnmmeyer commented 8 months ago

OK for me, duly noted.

In the metadata spreadsheets, the "Script" field might also be redundant.

We currently have the following foreground colors:

<g> green
<rdg>   green
<abbr>  brown
<sic>   red
<corr>  green
<orig>  magenta
<reg>   blue
<pb>, <lb>, etc. gray

danbalogh commented 8 months ago

Yes, "script" is in my opinion equally redundant in the metadata sheet.

Thanks for listing the colours. I think most of these should be retained, and the ground rule of using the colour of the lowest-level element could be applicable to all of them. <pb> and its friends should always be shown in grey and should never be contained within another tag that calls for colours, except for <rdg>.

<abbr> may in principle contain other coloured tags or, conceivably, be contained within one. It hardly ever occurs in my texts, but if it does interact with e.g. sic/corr or orig/reg in some files, then perhaps a different solution is needed. Using the colour of the lowest-level item should still be acceptable, but I wonder if it might be a good idea to use a background colour instead of a text colour for <abbr>. If it could be given, for instance, a light orange background colour, then any overlapping sic (etc.) could still be shown using font colour. Since <abbr> is a semantic tag rather than a text-critical one, to me at least it makes intuitive sense to show it with a background colour instead of text colour.

The use of green for <g> is in my opinion definitely a problem. Even if we retain green for apparatus readings, the ambiguity of using green for glyphs and for editorial corrections must be eliminated. Do all instances of <g> appear in green? Or is it only @type="numeral"? Does anyone know the reason why <g> is displayed with a colour? I see no need to do that, so my suggestion would be to remove the colour from <g> unless someone can tell why it's better to have it. If a colour is needed, then it should not be green but something else, e.g. teal, and perhaps restricted even then to <g type="numeral">.

michaelnmmeyer commented 8 months ago

Currently, we are using a background color (yellow) for <hi rend="mark">, and for this only. There are not many cases, and this is supposed to be temporary, so background colors can be used quite freely.

Concerning the use of green for <g>, apparently that's just me, so I will fix that and reproduce what Axelle did, viz. no markup at all, except for placeholder symbol names, which are formatted like this:

arlogriffiths commented 8 months ago

this discussion seems to be touching on several separate issues (mea culpa for having started it that way), some requiring Adeline's involvement. @danbalogh : can you help split the discussion up into separate issues and assign Adeline to any issue that involves the metadata spreadsheet?

I am fairly sure the matter of redundant representatiion of language and script metadata was discussed by Adeline, manu and myself when we were working on the template and guide for the metadata spreadsheet, but I don't remember why we accepted/required the redundancy.

arlogriffiths commented 8 months ago

I don't insist on any particular color for display of <rdg>.

danbalogh commented 8 months ago

I've created #254 for the metadata issue. Rather mea culpa for bringing it up here. I also think that the redundancy was discussed and also have no clear recollection of the details. But I think that back then the idea was to record something slightly different in the metadata, perhaps by allowing a freetext description of the language of the inscription (e.g. "non-Standard Sanskrit" or "Sanskrit with boundary descriptions in Telugu"). That way, the redundancy is only partial. But what we have in the sheets now can be matched 100% to the data encoded in the XML, so the redundancy is a bad thing.

@michaelnmmeyer : I like the cartouche display for symbols, although it bothers me a bit that there are no characters involved, so copy-pasting the displayed text (e.g. into Word) would result in "svasti śrīsymbol" for the example in your screenshot. For the planned hybrid view, I think it would be better to involve some sort of brackets instead of or in addition to the cartouche. Angle brackets may be best, but we could also open a separate issue to discuss this when it comes to implementation.

@arlogriffiths : please clarify: you don't insist on any particular colour for rdg so long as it is coloured OR you don't insist on any particular colour and are happy to have it in the default colour (i.e. black except for parts that get coloured for some other reason)? If 1, then perhaps we could use the same light blue as that in the sigla, or use teal. Also, what do you think of using a background colour (instead of font colour) to flag abbr elements?

michaelnmmeyer commented 8 months ago

@danbalogh OK, I will try to synthesize formatting choices we made so far

arlogriffiths commented 8 months ago

This is in response to Michaël's request "I need to know what language information you want to be displayed, and how. For now, I just enumerate all the @xml:lang I find in the document. I can produce something like "Text in Tamil, with parts in Sanskrit. Translations in English and in French", or just display "Tamil", or something else."

I am not sure we need anything in the display, but this depends a lot on how metadata and editions will eventually be shown in relation to each other.
If all metadata will be shown in conjunction with a given edition, then of course language usage needs to be among the metadata displayed.
And in that case an indication "Language usage: Tamil, Sanskrit" purely based on which xml:lang tags occur in the <div type="edition"> will be sufficient.
I don't think we will ever want an indication of any other xml:lang tags found in tyhe file (e.g. in <div type="translation">).

arlogriffiths commented 8 months ago

This is in response to Dan's request: "please clarify: you don't insist on any particular colour for rdg so long as it is coloured OR you don't insist on any particular colour and are happy to have it in the default colour (i.e. black except for parts that get coloured for some other reason)? If 1, then perhaps we could use the same light blue as that in the sigla, or use teal. Also, what do you think of using a background colour (instead of font colour) to flag abbr elements?"

I would not say I insist on the presence of any color, though I think its presence make the apparatus easier to digest
we don't want to use the same color as that used for sigla: in fact, if <rdg> is colored, then sigla can be shown merely with bold face (as they are also in the primary bibliography) but without color.
so I vote for light blue applied to <rdg> while sigla are to be shown in bold face
I am not exactly sure what background color means, but I don't expect having any objection to its use for <abbr>

danbalogh commented 8 months ago

@arlogriffiths : by background colour I meant what looks like a text marker, of the same sort (but a different colour) as what you have proposed for <hi rend="mark">.

Summarising where we are.

Displaying text languages:

no display of languages outside the edition div
any other language display is provisional and depends on the eventual display of metadata from the metadata table, and whether we continue redundantly recoding text languages in the metadata table
for the provisional display, I suggest the following, which is more complex than what Arlo suggests above, so it needs explicit approval from @arlogriffiths :
- "Primary language: [LANG]." where [LANG] = the @xml:lang of the edition div if present; if not present, then "Primary languages: [LANGS]." where [LANGS] = the @xml:lang on each textpart div within the edition div. Separate the list by commas, in order of the textparts; if the same language occurs more than once, then show only the first. It is mandatory to have @xml:lang on either the edition div or on each of its child textparts; anything else is an encoding error.
- PLUS: "Additional languages: [LANGS]." where [LANGS] = @xml:lang values in any children of the edition div other than <head> or <label>. Do not display "Additional languages" if there is no @xml:lang on any other children. List in order of occurrence, separated by commas, eliminating repetitions.

Displaying apparatus readings @arlogriffiths , I'm not sure my point about additional colours has come across. In a case that involves correction and/or normalisation in the lemma and the reading, such as in the screenshot below, we now have the reading in uniform green. I'm OK with your suggestion of light blue instead of green, but I strongly urge that this should not be a uniform colour, but overridden by green, red, magenta, or whatever is called for by any text-critical markup present. Would you be OK with that? If yes, then going a bit further, I note that the original light blue for sigla was probably introduced because light blue is associated with bibliographic citations and sigla are a sort of citation. I agree that we don't need the sigla to be coloured, but I would still rather have a unique colour for apparatus readings, and not one that has other connotations. So since earlier you said "I don't insist on any particular color", I think teal would be a better base colour for readings than cyan. If all this is OK, then I summarise as follows:

apparatus sigla should not have a colour, but be shown in default black and in bold face
apparatus readings should be coloured teal, overridden locally by colours dictated by children, if applicable.

Other details

we need a hybrid display option like svadat¿a?⟨ā⟩ṃ, and in time we'll want informative tooltips (e.g. showing the diplomatic reading in curated view and vice versa) as well as some toggles for custom display, e.g. "Show/hide emendations"
in bibliographic citations, the year is to be followed by a colon
text colour in display will always be determined by the lowest-level child element that calls for a colour, both in the apparatus (e.g. sic or corr inside rdg) and in the edition (e.g. sic and corr inside orig) (PENDING Arlo's opinion on readings above)
<g> elements should not be green, but be shown with a cartouche. In time, at least for the hybrid view, we should also consider bracketing them in some copiable character such as angle brackets in addion to (or instead of) displaying the cartouche. To my mind, colouring symbols grey, like line and page numbers, would also work.
<abbr> elements should be shown with default text colour and a background colour such as light brown or orange.

arlogriffiths commented 8 months ago

I agree on all points and explicitly approve where asked.

Just a question about <g>: I thought Michaël had devised a way do display them all by a Unicode symbol for the relevant genus. If this is true, or can be done, why do we still need cartouches?

danbalogh commented 8 months ago

Thanks for confirming.

Indeed, if the display remains a symbol character corresponding to the genus, then the cartouche becomes irrelevant, good that you point this out. I'm not sure that there is a symbol associated with the unclassified symbols, though, so the cartouche may remain in use for those, at least for the time being.

danbalogh commented 8 months ago

@michaelnmmeyer , if any of the specifics I suggest above are problematic to implement or not optimal for some reason that I have neglected to consider, do let me know.

arlogriffiths commented 8 months ago

I strongly favor representation of all <g> by some Unicode symbol. Then mouseover to display the value of @type.

michaelnmmeyer commented 8 months ago

Thank you @danbalogh for the synthesis. I will need some time to do the necessary work.

For <g>, the cartouche with a symbol name is only displayed when the gaiji table does not specify a textual representation. I have not finished to set this up in the project-documentation repo, though.