Open annettegessner opened 7 years ago
Matt just added in #1503 , that we should use this list: https://en.wikipedia.org/wiki/ISO_639:d Thoughts?
We are actually using the ISO 639-3 standard three letter language codes for our language tags. Probably the easiest place to access these is here: https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes. And that would mean that the language tags should be:
eng --> English deu --> German (ell --> Modern Greek) grc --> Ancient Greek lat --> Latin mul --> multiple languages cop --> Coptic heb --> Hebrew
Ha, I was a little faster this time. ;-) Great, thanks for clarifying!
I spotted some incoherence in our use of language tags. Do we have our own list of abbreviations that we use? If yes, please direct me to it! If not: Epidoc states, that the IANA Language Subtag Registry should be used and I would follow this advise. Here are some examples of language subtags we need on a regular basis according to this recommended list: en --> English de --> German (el --> Modern Greek) grc --> Ancient Greek la --> Latin mul --> multiple languages
Some of the deviations I found in First1kGreek: "eng" instead of "en" for English "lat" instead of "la" for Latin "ger" or "deu" instead of "de" for German "el" which is Modern Greek, when it should be Ancient Greek "grc"
There a re two different instances, where we used those subtags: 1) language ident="" --> to define the languages used in the body of the XML (i.e. the languages the text itself is written in) 2) xml:lang="" --> to define language used in (a section in) the header
Please let me know, if you agree with these abbreviations, so we can start fixing these!