erc-dharma / project-documentation

DHARMA Project Documentation
Creative Commons Attribution 4.0 International
3 stars 3 forks source link

questions and requests regarding display of inscriptions #250

Closed arlogriffiths closed 4 months ago

arlogriffiths commented 9 months ago

examples taken from https://dharman.in/display/DHARMA_INSPallava00004 https://dharman.in/display/DHARMA_INSCIK00334 https://dharman.in/display/DHARMA_INSIDENKTigaRon-TluRon

Metadata What is the use of indications like "Languages: eng, fra, pra, san." — especially if, as is the case here, we see a mix of original language and meta-languages being listed?

Editorial conventions and display modes The editorial conventions previously set up by Axelle and represented at https://erc-dharma.github.io/editorial were addressing a situation where we did not yet have a separation between Logical and Physical display. Now that we do have such a separation, and we are still using color codes while we now also have explanations in mouse-over, I am not sure it is also necessary to maintain brackets like ⟨...⟩, ¿...?, etc., which are in most cases highly particular to DHARMA (as opposed to general scholarly conventions). @danbalogh : what do you think? Can we simplify the redundancy of representation, or do we want to keep it for copy/pasting to print context? But in that case, I'd also like to be able to keep the mixed representation which shows both the diplomatic and the curated state of the text in a single edition. @michaelnmmeyer : could we offer another display mode which retains display like svadat¿a?⟨ā⟩ṃ for ease of copying to a print context?

Display of bibliographic data The display of bibliography at DHARMAN.IN involves some differences from the previously agreed upon conventions. Compare

  1. <bibl rend="omitname"><ptr target="bib:Soutif2009_03"/><citedRange unit="page">507</citedRange></bibl> This was displayed as (2009: page 507) and is now displayed as (2009, 507)

  2. <bibl n="DS"><ptr target="bib:Soutif2009_03"/><citedRange unit="page">507</citedRange></bibl>

    Capture d’écran 2023-12-22 à 06 29 22 Capture d’écran 2023-12-22 à 06 29 04
  3. <bibl n="JFF"><ptr target="bib:Fleet1880_01"/><citedRange unit="page">100-102</citedRange></bibl>

Capture d’écran 2023-12-22 à 06 35 24 Capture d’écran 2023-12-22 à 06 35 43

My reaction on changes in bibliographic display:

Apparatus Our database needs to come closer to the design previously established by Axelle. Compare

Capture d’écran 2023-12-22 à 06 43 36 Capture d’écran 2023-12-22 à 06 43 23

My observations on the way apparatus is presently displayed:

manufrancis commented 8 months ago

My view about:

Editorial conventions and display modes

danbalogh commented 8 months ago

Re display modes: I am strongly of the opinion that we do need a mixed/complete mode. Especially, but not only, for printed publications. Mouseover tips are nice, but for example if I hover over an emendation, it only tells me "corrected text" - i.e. it tells me what the markup means, but it doesn't tell me what the pre-correction text is. We certainly need some way where it is possible to see pre- and post-editorial versions simultaneously, and preferably without the need to keep two windows open or to hover above each instance that I wish to reveal. We could do this in the Physical view and rename the current Physical to Diplomatic. Or we could offer a combination of two switches for multiple view modes: Logical and Physical (which to my mind should mean that the text is displayed as paragraphs as stanzas versus laid out as in the original), both available in Diplomatic (as received), Curated (as emended) and Mixed. This may of course be too much of a complication. (By the way, I feel strange about the term "corrected text" - it makes me think primarily of a correction in the original, not an editorial emendation.)

Re editorial conventions: the general scholarly conventions are far from uniform; the Epigraphia Indica style is still quite prevalent, but no few publications in the last couple of decades have attempted to innovate. I assume that for SE Asian editions, the picture is similar, but of course I don't know. We are in a position where we may be able to innovate and make it stick, establishing a new standard convention that works better than the old. Also, many of the details in which the DHARMA convention diverges from the Sanskrit epigraphic tradition are in accordance with the Leiden markup which is pretty much the standard for classicists, but which I believe is also used more widely than the Classical Mediterranean region. Before implementing this iteration of DHARMA display, we might want to discuss the DHARMA print markup conventions once more, as I think we still have some inconsistencies. For the record, my suggestion is (and has been from the beginning) that our priorities should be on the one hand convergence with Leiden where we diverge from the EI (and other older Indic) convention, and on the other hand an internal consistency, so that brackets of a similar nature should be employed for conceptually similar things. One problem is of course the inconsiderate use of the underdot for unclear in the Leiden convention, which we obviously cannot adopt, which takes us to our straightforward solution of using rounded parentheses for unclear. (I wholeheartedly abhor the EI-style use of square brackets for unclear, since square brackets are quite universally used in fields outside epigraphy for editorial insertions.) I stop here, but hope to continue this at some point.

Re bibliographic display: I have no strong preferences in any direction and I'm happy to accept whatever Arlo and others propose. Still, here are some thoughts.

michaelnmmeyer commented 8 months ago

Metadata

I need to know what language information you want to be displayed, and how. For now, I just enumerate all the @xml:lang I find in the document. I can produce something like "Text in Tamil, with parts in Sanskrit. Translations in English and in French", or just display "Tamil", or something else.

The catalog search interface likewise looks for all languages used in each document. I can add new search fields like "text.lang" (main language of the text), "trans.lang" (main language of the translation), or something like that.

Editorial conventions and display modes

OK for a "full" display mode.

Display of bibliographic data

OK for following English conventions for page numbers (p., pp., en dash).

For bibliographic entries that specify a <citedRange unit="page">, I planned to ignore the page range from Zotero (if any) and only use the one specified in the XML document. Thus, for

<bibl n="JFF"><ptr target="bib:Fleet1880_01"/><citedRange unit="page">100-102</citedRange></bibl>

we would just have

[...]. IA, p. 100-102.

since the page range 96-103 from the Zotero entry is less specific and redundant. Would this be OK?

OK for using colons instead of commas in article entries (e.g. Fleet. "Hello." IA: p. 100-102) and in references (e.g. See Fleet (1880: p. 4)).

I would be in favour of always keeping an explicit p. (or pp.) for indicating page numbers, because references start to look weird when several units are used. See e.g. Fleet 1880: vol. 2, 4, n. 20, where the 4 that represents a page number looks like a typo. I could also omit the p./pp. in specific circumstances like: the page range is the first given <citedRange>, if you want.

OK for dropping No place.

michaelnmmeyer commented 8 months ago

Follow-up for remarks on the apparatus and for Dan's comment (which I noticed only after publishing mine)

Display modes

We will have tooltips that indicate what is the corrected text (in the physical display) and what is the original text (in the logical display). This is a feature Manu wants, too, but this is not implemented yet.

For the display modes, I propose to add a few toggles, like "Show/hide emendations", "Show/hide original text", etc. We have not talked about the apparatus display, by the way: should it also have several view modes, several display toggles, etc.?

OK for <corr>, I will replace "Corrected text" with "Emended text".

Bibliographic display

I will let you decide what to do.

Apparatus @arlogriffiths

The presence of page runs in the case you mention is a bug, thank you for noticing.

For the size of diamond, this is font-related. Axelle's display looks like this on my computer: image

There is a smaller diamond MEDIUM WHITE DIAMOND in Unicode (◇ vs. ⬦), I will use it instead.

OK for coloring readings in green. I must point out, however, that we have several cases where colors overlap. In such situations, I can keep only one color, or combine them e.g. blue + red = violet, or follow some other convention.

danbalogh commented 8 months ago

Just a few points now in haste (I'll be able to contribute more in January, but I'm afraid not before). Having toggles in both/all views and having the tooltip display the alternative (pre- or post-emendation) text are both great, I'm very happy these will be implemented. The combined view would then be useful mainly for copy-pasting for other uses.

Colour-coding marked-up text: I don't think colour combinations should be used; let's not complicate the colours because nobody will be able to keep in mind what means what. So where items already associated with different colours overlap, we'll just have to live with displaying only one of the colours.

arlogriffiths commented 8 months ago

I rather strongly prefer colon (2009: 507), over comma (2009, 507), especially in cases where the reference is not encapsulated in parentheses, as confusion then more easily arises with normal sentence punctuation.

danbalogh commented 8 months ago

Apropos of displaying the languages in a file. I agree that listing a mix of languages is not useful. The smart display Michaël proposed would be much nicer.

BUT what concerns me is that we also have a number of fields in the metadata sheets for language and script. It is not clear to me why this information has to be recorded redundantly in the metadata since it is supposed to be encoded in the files. It seems to me that the metadata sheet can hold no information beyond what is encoded in the XML, whereas the encoded files do hold more information than the sheet, since the encoding also shows which parts of a text are in which language. So, if language information can be easily extracted from the XML files and presented to the end user in the smart form Michaël has suggested - and I gather that this is not a problem - then the redundant repetition of the same information should be cut out of the metadata sheets.

danbalogh commented 8 months ago

Re Display of bibliographic data: I'm happy to go with the colon. EDITED: That might also work in the case of complex references with multiple units, e.g. Fleet 1880: vol. 2: 4, n. 20, but I could also imagine displaying p/pp in such cases (while dropping it elsewhere) as Michaël has suggested above.

danbalogh commented 8 months ago

Returning to colour overlaps. I still definitely think that new colours must be avoided, so where there is a colour conflict, one must be picked over the other. As far as apparatus readings are concerned, I'm not at all sure the green colouring is really needed for readings and I would slightly prefer if @arlogriffiths changed his mind about that and accepted using the default colour instead. Reasons why: 1, the readings are already highlighted by italics and diamonds, so even though the colouring is more conspicuous than either, I don't think we really need a third way to pick out readings from the surrounding text; 2, the same (or a very similar) light green colour is used in the edition display for editorial correction, so (assuming that any end user pays attention to the colours) its use is confusing. That said, if Arlo insists on showing readings in green, then I can live with it. In that case, I suggest that the contents of <rdg> elements should normally be shown in green, but if any child of the <rdg> is associated with a colour (e.g. <sic> or whatever), then that colour should override the green. The function of highlighting the readings would still be served with multicoloured readings. Also, I believe this is simpler to implement than to force-inherit the green colour to all children of the <rdg>. But again, I can also live with the latter if that is Arlo's definite preference.

Turning now to possible colour conflicts within the edition (or in edition snippets in lemmas and readings and in other divs such as the commentary), I believe these are going to be rare. It seems to me that we presently have colours for <sic>, <corr>, <orig>, <reg> and nothing else - so the only instances of colour conflict would be when one of these is nested in another as in EGD §6.3.3. In this case too, I think the displayed colour should be determined by the lowest-level element in the hierarchy, so e.g. a <sic> within <orig> would be coloured red like any other <sic>, while the rest of the contents of the <orig> tag would be magenta like any other <orig>.

michaelnmmeyer commented 8 months ago

OK for me, duly noted.

In the metadata spreadsheets, the "Script" field might also be redundant.

We currently have the following foreground colors:

<g> green
<rdg>   green
<abbr>  brown
<sic>   red
<corr>  green
<orig>  magenta
<reg>   blue
<pb>, <lb>, etc. gray
danbalogh commented 8 months ago

Yes, "script" is in my opinion equally redundant in the metadata sheet.

Thanks for listing the colours. I think most of these should be retained, and the ground rule of using the colour of the lowest-level element could be applicable to all of them. <pb> and its friends should always be shown in grey and should never be contained within another tag that calls for colours, except for <rdg>.

<abbr> may in principle contain other coloured tags or, conceivably, be contained within one. It hardly ever occurs in my texts, but if it does interact with e.g. sic/corr or orig/reg in some files, then perhaps a different solution is needed. Using the colour of the lowest-level item should still be acceptable, but I wonder if it might be a good idea to use a background colour instead of a text colour for <abbr>. If it could be given, for instance, a light orange background colour, then any overlapping sic (etc.) could still be shown using font colour. Since <abbr> is a semantic tag rather than a text-critical one, to me at least it makes intuitive sense to show it with a background colour instead of text colour.

The use of green for <g> is in my opinion definitely a problem. Even if we retain green for apparatus readings, the ambiguity of using green for glyphs and for editorial corrections must be eliminated. Do all instances of <g> appear in green? Or is it only @type="numeral"? Does anyone know the reason why <g> is displayed with a colour? I see no need to do that, so my suggestion would be to remove the colour from <g> unless someone can tell why it's better to have it. If a colour is needed, then it should not be green but something else, e.g. teal, and perhaps restricted even then to <g type="numeral">.

michaelnmmeyer commented 8 months ago

Currently, we are using a background color (yellow) for <hi rend="mark">, and for this only. There are not many cases, and this is supposed to be temporary, so background colors can be used quite freely.

Concerning the use of green for <g>, apparently that's just me, so I will fix that and reproduce what Axelle did, viz. no markup at all, except for placeholder symbol names, which are formatted like this: image

arlogriffiths commented 8 months ago

this discussion seems to be touching on several separate issues (mea culpa for having started it that way), some requiring Adeline's involvement. @danbalogh : can you help split the discussion up into separate issues and assign Adeline to any issue that involves the metadata spreadsheet?

I am fairly sure the matter of redundant representatiion of language and script metadata was discussed by Adeline, manu and myself when we were working on the template and guide for the metadata spreadsheet, but I don't remember why we accepted/required the redundancy.

arlogriffiths commented 8 months ago

I don't insist on any particular color for display of <rdg>.

danbalogh commented 8 months ago

I've created #254 for the metadata issue. Rather mea culpa for bringing it up here. I also think that the redundancy was discussed and also have no clear recollection of the details. But I think that back then the idea was to record something slightly different in the metadata, perhaps by allowing a freetext description of the language of the inscription (e.g. "non-Standard Sanskrit" or "Sanskrit with boundary descriptions in Telugu"). That way, the redundancy is only partial. But what we have in the sheets now can be matched 100% to the data encoded in the XML, so the redundancy is a bad thing.

@michaelnmmeyer : I like the cartouche display for symbols, although it bothers me a bit that there are no characters involved, so copy-pasting the displayed text (e.g. into Word) would result in "svasti śrīsymbol" for the example in your screenshot. For the planned hybrid view, I think it would be better to involve some sort of brackets instead of or in addition to the cartouche. Angle brackets may be best, but we could also open a separate issue to discuss this when it comes to implementation.

@arlogriffiths : please clarify: you don't insist on any particular colour for rdg so long as it is coloured OR you don't insist on any particular colour and are happy to have it in the default colour (i.e. black except for parts that get coloured for some other reason)? If 1, then perhaps we could use the same light blue as that in the sigla, or use teal. Also, what do you think of using a background colour (instead of font colour) to flag abbr elements?

michaelnmmeyer commented 8 months ago

@danbalogh OK, I will try to synthesize formatting choices we made so far

arlogriffiths commented 8 months ago

This is in response to Michaël's request "I need to know what language information you want to be displayed, and how. For now, I just enumerate all the @xml:lang I find in the document. I can produce something like "Text in Tamil, with parts in Sanskrit. Translations in English and in French", or just display "Tamil", or something else."

arlogriffiths commented 8 months ago

This is in response to Dan's request: "please clarify: you don't insist on any particular colour for rdg so long as it is coloured OR you don't insist on any particular colour and are happy to have it in the default colour (i.e. black except for parts that get coloured for some other reason)? If 1, then perhaps we could use the same light blue as that in the sigla, or use teal. Also, what do you think of using a background colour (instead of font colour) to flag abbr elements?"

danbalogh commented 8 months ago

@arlogriffiths : by background colour I meant what looks like a text marker, of the same sort (but a different colour) as what you have proposed for <hi rend="mark">.

Summarising where we are.

Displaying text languages:

Displaying apparatus readings @arlogriffiths , I'm not sure my point about additional colours has come across. In a case that involves correction and/or normalisation in the lemma and the reading, such as in the screenshot below, we now have the reading in uniform green. image I'm OK with your suggestion of light blue instead of green, but I strongly urge that this should not be a uniform colour, but overridden by green, red, magenta, or whatever is called for by any text-critical markup present. Would you be OK with that? If yes, then going a bit further, I note that the original light blue for sigla was probably introduced because light blue is associated with bibliographic citations and sigla are a sort of citation. I agree that we don't need the sigla to be coloured, but I would still rather have a unique colour for apparatus readings, and not one that has other connotations. So since earlier you said "I don't insist on any particular color", I think teal would be a better base colour for readings than cyan. If all this is OK, then I summarise as follows:

Other details

arlogriffiths commented 8 months ago

I agree on all points and explicitly approve where asked.

Just a question about <g>: I thought Michaël had devised a way do display them all by a Unicode symbol for the relevant genus. If this is true, or can be done, why do we still need cartouches?

danbalogh commented 8 months ago

Thanks for confirming.

Indeed, if the display remains a symbol character corresponding to the genus, then the cartouche becomes irrelevant, good that you point this out. I'm not sure that there is a symbol associated with the unclassified symbols, though, so the cartouche may remain in use for those, at least for the time being.

danbalogh commented 8 months ago

@michaelnmmeyer , if any of the specifics I suggest above are problematic to implement or not optimal for some reason that I have neglected to consider, do let me know.

arlogriffiths commented 8 months ago

I strongly favor representation of all <g> by some Unicode symbol. Then mouseover to display the value of @type.

michaelnmmeyer commented 8 months ago

Thank you @danbalogh for the synthesis. I will need some time to do the necessary work.

For <g>, the cartouche with a symbol name is only displayed when the gaiji table does not specify a textual representation. I have not finished to set this up in the project-documentation repo, though.