BetaMasaheft / Documentation

Die Schriftkultur des christlichen Äthiopiens: Eine multimediale Forschungsumgebung
3 stars 3 forks source link

Encoding language-script when no direct match #2465

Closed eu-genia closed 8 months ago

eu-genia commented 8 months ago

I have not found anything 100% fitting or explaining this in TEI but I guess what is recommended is to use a separate language subtag for the script as an extention to the language tag (there is an example with az-Arab for Azeri in Arabic script as opposed to az-Latn for Azeri in Latin script or az-Cyrl for Azeri in Cyrillic https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-language.html), which would mean using ar-Ethi for Arabic in fidel or gez-Sarb for Ethiopic in Sabaic script or har-Ethi vs har-Arab for Harari written in fidel or in Arabic script

The syntax is always main language tag in small letters - (hyphen) script subtag first letter capitalized (can be followed by the region subtag, all capitals). Some subtags are listed here https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry, one can also do the search here https://r12a.github.io/app-subtags/. The latest link can also check your tag-subtag combination for you.

See also https://www.w3.org/International/questions/qa-choosing-language-tags

(there is also the@style attribute one can use after xml:lang in TEI as described https://www.tei-c.org/release/doc/tei-p5-doc/en/html/WD.html, but it seems to be suggested for modes of writing, orientation etc. so xml:lang="ar" style="script: Ethiopic" could be possible technically but seems to be a creative interpretation of TEI so I would possibly prefer the first option)

_Originally posted by @eu-genia in https://github.com/BetaMasaheft/Manuscripts/pull/2237#discussion_r1457543392_

FYI @thea-m @DenisNosnitsin1970 @CarstenHoffmannMarburg @abausi I can try this out and eventually add to the Guidelines

eu-genia commented 8 months ago

NB the edited inscriptions should be corrected, where relevant, to xml:lang="gez-Sarb" type="normalized" (since they are provided in XML files in Romanization not in direct transcription)

eu-genia commented 8 months ago

done, hope all is clear )