Language subtags - Githubissues

annettegessner commented 7 years ago

I spotted some incoherence in our use of language tags. Do we have our own list of abbreviations that we use? If yes, please direct me to it! If not: Epidoc states, that the IANA Language Subtag Registry should be used and I would follow this advise. Here are some examples of language subtags we need on a regular basis according to this recommended list: en --> English de --> German (el --> Modern Greek) grc --> Ancient Greek la --> Latin mul --> multiple languages

Some of the deviations I found in First1kGreek: "eng" instead of "en" for English "lat" instead of "la" for Latin "ger" or "deu" instead of "de" for German "el" which is Modern Greek, when it should be Ancient Greek "grc"

There a re two different instances, where we used those subtags: 1) language ident="" --> to define the languages used in the body of the XML (i.e. the languages the text itself is written in) 2) xml:lang="" --> to define language used in (a section in) the header

Please let me know, if you agree with these abbreviations, so we can start fixing these!

annettegessner commented 7 years ago

Matt just added in #1503 , that we should use this list: https://en.wikipedia.org/wiki/ISO_639:d Thoughts?

sonofmun commented 7 years ago

We are actually using the ISO 639-3 standard three letter language codes for our language tags. Probably the easiest place to access these is here: https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes. And that would mean that the language tags should be:

eng --> English deu --> German (ell --> Modern Greek) grc --> Ancient Greek lat --> Latin mul --> multiple languages cop --> Coptic heb --> Hebrew

annettegessner commented 7 years ago

Ha, I was a little faster this time. ;-) Great, thanks for clarifying!

OpenGreekAndLatin / First1KGreek

Language subtags #1548