Brown-University-Library / OLD-ARCHIVED_iip-production

3 stars 9 forks source link

check iso language codes and make them all consistent in Data and Code #98

Open emylonas opened 3 years ago

emylonas commented 3 years ago

title says it all.

emylonas commented 3 years ago

Todo:

  1. @emylonas decide on codes by looking at data.
  2. @emylonas or @birkin grep code to find any wrong codes.
  3. @birkin implement.
emylonas commented 3 years ago

All codes in the <textLang> element are "he" and "la" (2-letter ISO codes) All codes in the body, on the <div> elements that contain inscription text, are "heb" and "lat" (3-letter ISO codes)

Greek and Aramaic aren't an issue as they are always 3-letter codes. (grc, arc)

After a lot of research into 2 and 3 letter codes, what I have discovered is that we should probably be using the language subtags defined by IANA: https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry (note that this very long list also contains script subtags, dialect subtags and so on.). In the IANA list, Latin is la, Hebrew is he unless it's Old Hebrew, in which case it's hbo. It might be better if we used hbo, but we haven't thus far, and it's probably better not to change.

It's probably best to change the heb codes to he codes at this point. Same with Latin. This is to be done in the texts, and doesn't require any further involvement from Birkin.