TST-Project / mss

Woking repository for the TST project
Creative Commons Attribution Share Alike 4.0 International
0 stars 1 forks source link

Replacement of <g xml:lang/> by <seg xml:lang/> #42

Open manufrancis opened 3 years ago

manufrancis commented 3 years ago

Following Axelle's advice, let us use <seg xml:lang/> instead of <g xml:lang/>

  1. to tag Grantha syllables in a Tamil text
  2. to tag Sanskrit in Grantha sentences/phrases in a Tamil text

I guess we will thus use <seg xml:lang="ta-Gran"/> for 1. <seg xml:lang="sa-Gran"/> for 2.

Does it make sense?

PS: in bot cases, the display intransliteration should be bold.

chchch commented 3 years ago

Hmm, this is a somewhat complicated issue. The XML Recommendation suggests using @xml:lang to indicate language and optional locale, rather than script. So for example, es is Spanish, es-ES is Spain Spanish, and es-MX is Mexican Spanish. We probably won't run into issues with this since the locale is always in all-caps, while the ISO script abbreviations we're using aren't. It's worth thinking about though.

Another issue that we might have with this is that it's not quite correct to tag something as ta-Gran if we're really transliterating the text; it should rather be ta-Latn. This is problematic because we have researchers who are actually inputting their text in Tamil script, e..g., ta-Taml, rather than giving a Latin transliteration. By tagging that correctly, we can easily switch between Tamil script and Latin transliteration. But if we start using ta-Gran to mean "Tamil Grantha transliterated into Latin script", then we can't use that system for records that are not transliterated (Jean-Luc would probably be unhappy).

chchch commented 3 years ago

It might actually be better to use something like @rend="grantha", as specified in the DHARMA encoding guide, and save ta-Gran to mean when the tagged text is in untransliterated Grantha script.

manufrancis commented 3 years ago

Fine! Let us go with @rend="grantha". Could you implement this in the editor and have it displayed in bold?

chchch commented 3 years ago

bold in both Latin transliteration and Tamil Grantha display, right?