Open bmamlin opened 3 years ago
There are over 1700 locale names within OCL's Locale concepts. Comparing Java locales to those names...
$ http -b 'https://api.openconceptlab.org/orgs/OCL/sources/Locales/concepts/?verbose=true&limit=1000' \
| jq -r '.[].names[].name' | sort -u > ocl_locale_list_prod.txt
$ docker run --rm groovy groovy -e 'java.text.SimpleDateFormat.availableLocales.each {println it}' \
| sort > java_locale_list.txt
$ comm -23 java_locale_list.txt ocl_locale_list_prod.txt
yields 117 possible Java locales that aren't represented in OCL:
ar_AE
ar_BH
ar_DZ
ar_EG
ar_IQ
ar_JO
ar_KW
ar_LB
ar_LY
ar_MA
ar_OM
ar_QA
ar_SA
ar_SD
ar_SY
ar_TN
ar_YE
be_BY
bg_BG
ca_ES
cs_CZ
da_DK
de_AT
de_CH
de_DE
de_GR
de_LU
el_CY
el_GR
en_AU
en_CA
en_GB
en_IE
en_IN
en_MT
en_NZ
en_PH
en_SG
en_US
en_ZA
es_AR
es_BO
es_CL
es_CO
es_CR
es_CU
es_DO
es_EC
es_ES
es_GT
es_HN
es_MX
es_NI
es_PA
es_PE
es_PR
es_PY
es_SV
es_US
es_UY
es_VE
et_EE
fi_FI
fr_BE
fr_CA
fr_CH
fr_FR
fr_LU
ga_IE
hi_IN
hr_HR
hu_HU
in
in_ID
is_IS
it_CH
it_IT
iw
iw_IL
ja_JP
ja_JP_JP_#u-ca-japanese
ko_KR
lt_LT
lv_LV
mk_MK
ms_MY
mt_MT
nl_BE
nl_NL
no_NO
no_NO_NY
pl_PL
pt_BR
pt_PT
ro_RO
ru_RU
sk_SK
sl_SI
sq_AL
sr_BA
sr_BA_#Latn
sr_CS
sr_ME
sr_ME_#Latn
sr_RS
sr_RS_#Latn
sr__#Latn
sv_SE
th_TH
th_TH_TH_#u-nu-thai
tr_TR
uk_UA
vi_VN
zh_CN
zh_HK
zh_SG
zh_TW
@paynejd, we'll need to decide how you want to introduce these locale names into OCL (e.g., synonyms, new entries, or a combination).
It looks like Java locales are derived from Unicode's Common Locale Data Repository (CLDR). Their download page points to a GitHub repo with CLDR json data. Java locales appear to be derived from CLDR locale names.
A good summary of locales in Java is available in Internationalization: Understanding Locale in the Java Platform. From that article:
If you want to know what
Locale
objects you can create, the answer is simple: You can create any locale you'd like.
In this case, perhaps we should just focus on supporting the locales that are needed right now – Indonesian (in
) and British English (en_GB
) and not try to support all possible Java locales.
Best Current Practice 47 for language tags (BCP 47) is RFC 5646, which uses hyphens to separate language and region codes. Because Java's Locale
is ancient, it uses underscores instead of hyphens. Nice summary here.
Per Wikipedia's ISO 639-1 listing, the "in" representation for Indonesian language was replaced with "id" in 1989 (ref). But this "in" designation still exists in Java (as listed above).
So, I've proposed adding "id" as a synonym for Indonesian language with type "ISO 639-1 Non-preferred". NOTE: this is the first case of non-preferred for ISO 639-1. Previously, the only "non-preferred" language types were ISO 639-2 Non-preferred.
I added an entry for British English as "en-GB" (following IETF BCP 47) and added synonyms "en-GB" and "en_GB". NOTE: this introduces two new locale name types "IETF BCP 47" and "Java Locale".
I created oclapi2 PR-9 for these proposed changes.
@bmamlin @paynejd en_GB and ind are available on QA, Demo and Staging
Thanks @snyaggarwal.
FYI – for Indonesian, we need the legacy name "in" as name type "ISO 639-1 Non-preferred" as well. I manually added this to staging before turning on the OpenMRS Custom Validation schema for the updated PIH dictionary.
see #1312 #1364
We ran into a validation issue importing the PIH dictionary (#732) where some valid Java locales are unknown to OCL. Java locales should be known to OCL. Here is a list of Java 8 locales...