OpenConceptLab / ocl_issues

Issues for all OCL repos. NOTE: Install ZenHub Browser Extension and request access to the OCL Roadmap board to view all issues and to contribute
4 stars 1 forks source link

Implement optional country code support for locales #757

Open bmamlin opened 3 years ago

bmamlin commented 3 years ago

We ran into a validation issue importing the PIH dictionary (#732) where some valid Java locales are unknown to OCL. Java locales should be known to OCL. Here is a list of Java 8 locales...

$ docker run --rm groovy groovy -e 'println System.getProperty("java.version")'
1.8.0_252
$ docker run --rm groovy groovy -e \
  'java.text.SimpleDateFormat.availableLocales.each {println "$it|$it.displayName|$it.displayCountry"}' \
  | sort
Locale Display Name Country
ar_AE Arabic (United Arab Emirates) United Arab Emirates
ar_BH Arabic (Bahrain) Bahrain
ar_DZ Arabic (Algeria) Algeria
ar_EG Arabic (Egypt) Egypt
ar_IQ Arabic (Iraq) Iraq
ar_JO Arabic (Jordan) Jordan
ar_KW Arabic (Kuwait) Kuwait
ar_LB Arabic (Lebanon) Lebanon
ar_LY Arabic (Libya) Libya
ar_MA Arabic (Morocco) Morocco
ar_OM Arabic (Oman) Oman
ar_QA Arabic (Qatar) Qatar
ar_SA Arabic (Saudi Arabia) Saudi Arabia
ar_SD Arabic (Sudan) Sudan
ar_SY Arabic (Syria) Syria
ar_TN Arabic (Tunisia) Tunisia
ar_YE Arabic (Yemen) Yemen
ar Arabic
be_BY Belarusian (Belarus) Belarus
be Belarusian
bg_BG Bulgarian (Bulgaria) Bulgaria
bg Bulgarian
ca_ES Catalan (Spain) Spain
ca Catalan
cs_CZ Czech (Czech Republic) Czech Republic
cs Czech
da_DK Danish (Denmark) Denmark
da Danish
de_AT German (Austria) Austria
de_CH German (Switzerland) Switzerland
de_DE German (Germany) Germany
de_GR German (Greece) Greece
de_LU German (Luxembourg) Luxembourg
de German
el_CY Greek (Cyprus) Cyprus
el_GR Greek (Greece) Greece
el Greek
en_AU English (Australia) Australia
en_CA English (Canada) Canada
en_GB English (United Kingdom) United Kingdom
en_IE English (Ireland) Ireland
en_IN English (India) India
en_MT English (Malta) Malta
en_NZ English (New Zealand) New Zealand
en_PH English (Philippines) Philippines
en_SG English (Singapore) Singapore
en_US English (United States) United States
en_ZA English (South Africa) South Africa
en English
es_AR Spanish (Argentina) Argentina
es_BO Spanish (Bolivia) Bolivia
es_CL Spanish (Chile) Chile
es_CO Spanish (Colombia) Colombia
es_CR Spanish (Costa Rica) Costa Rica
es_CU Spanish (Cuba) Cuba
es_DO Spanish (Dominican Republic) Dominican Republic
es_EC Spanish (Ecuador) Ecuador
es_ES Spanish (Spain) Spain
es_GT Spanish (Guatemala) Guatemala
es_HN Spanish (Honduras) Honduras
es_MX Spanish (Mexico) Mexico
es_NI Spanish (Nicaragua) Nicaragua
es_PA Spanish (Panama) Panama
es_PE Spanish (Peru) Peru
es_PR Spanish (Puerto Rico) Puerto Rico
es_PY Spanish (Paraguay) Paraguay
es_SV Spanish (El Salvador) El Salvador
es_US Spanish (United States) United States
es_UY Spanish (Uruguay) Uruguay
es_VE Spanish (Venezuela) Venezuela
es Spanish
et_EE Estonian (Estonia) Estonia
et Estonian
fi_FI Finnish (Finland) Finland
fi Finnish
fr_BE French (Belgium) Belgium
fr_CA French (Canada) Canada
fr_CH French (Switzerland) Switzerland
fr_FR French (France) France
fr_LU French (Luxembourg) Luxembourg
fr French
ga_IE Irish (Ireland) Ireland
ga Irish
hi_IN Hindi (India) India
hi Hindi
hr_HR Croatian (Croatia) Croatia
hr Croatian
hu_HU Hungarian (Hungary) Hungary
hu Hungarian
in_ID Indonesian (Indonesia) Indonesia
in Indonesian
is_IS Icelandic (Iceland) Iceland
is Icelandic
it_CH Italian (Switzerland) Switzerland
it_IT Italian (Italy) Italy
it Italian
iw_IL Hebrew (Israel) Israel
iw Hebrew
ja_JPJP#u-ca-japanese Japanese (Japan,JP) Japan
ja_JP Japanese (Japan) Japan
ja Japanese
ko_KR Korean (South Korea) South Korea
ko Korean
lt_LT Lithuanian (Lithuania) Lithuania
lt Lithuanian
lv_LV Latvian (Latvia) Latvia
lv Latvian
mk_MK Macedonian (Macedonia) Macedonia
mk Macedonian
ms_MY Malay (Malaysia) Malaysia
ms Malay
mt_MT Maltese (Malta) Malta
mt Maltese
nl_BE Dutch (Belgium) Belgium
nl_NL Dutch (Netherlands) Netherlands
nl Dutch
no_NO_NY Norwegian (Norway,Nynorsk) Norway
no_NO Norwegian (Norway) Norway
no Norwegian
pl_PL Polish (Poland) Poland
pl Polish
pt_BR Portuguese (Brazil) Brazil
pt_PT Portuguese (Portugal) Portugal
pt Portuguese
ro_RO Romanian (Romania) Romania
ro Romanian
ru_RU Russian (Russia) Russia
ru Russian
sk_SK Slovak (Slovakia) Slovakia
sk Slovak
sl_SI Slovenian (Slovenia) Slovenia
sl Slovenian
sq_AL Albanian (Albania) Albania
sq Albanian
srBA#Latn Serbian (Latin,Bosnia and Herzegovina) Bosnia and Herzegovina
sr_BA Serbian (Bosnia and Herzegovina) Bosnia and Herzegovina
sr_CS Serbian (Serbia and Montenegro) Serbia and Montenegro
srME#Latn Serbian (Latin,Montenegro) Montenegro
sr_ME Serbian (Montenegro) Montenegro
srRS#Latn Serbian (Latin,Serbia) Serbia
sr_RS Serbian (Serbia) Serbia
sr__#Latn Serbian (Latin)
sr Serbian
sv_SE Swedish (Sweden) Sweden
sv Swedish
th_THTH#u-nu-thai Thai (Thailand,TH) Thailand
th_TH Thai (Thailand) Thailand
th Thai
tr_TR Turkish (Turkey) Turkey
tr Turkish
uk_UA Ukrainian (Ukraine) Ukraine
uk Ukrainian
vi_VN Vietnamese (Vietnam) Vietnam
vi Vietnamese
zh_CN Chinese (China) China
zh_HK Chinese (Hong Kong) Hong Kong
zh_SG Chinese (Singapore) Singapore
zh_TW Chinese (Taiwan) Taiwan
zh Chinese
bmamlin commented 3 years ago

There are over 1700 locale names within OCL's Locale concepts. Comparing Java locales to those names...

$ http -b 'https://api.openconceptlab.org/orgs/OCL/sources/Locales/concepts/?verbose=true&limit=1000' \
  | jq -r '.[].names[].name' | sort -u > ocl_locale_list_prod.txt
$ docker run --rm groovy groovy -e   'java.text.SimpleDateFormat.availableLocales.each {println it}'   \
  | sort > java_locale_list.txt
$ comm -23 java_locale_list.txt ocl_locale_list_prod.txt

yields 117 possible Java locales that aren't represented in OCL:

ar_AE
ar_BH
ar_DZ
ar_EG
ar_IQ
ar_JO
ar_KW
ar_LB
ar_LY
ar_MA
ar_OM
ar_QA
ar_SA
ar_SD
ar_SY
ar_TN
ar_YE
be_BY
bg_BG
ca_ES
cs_CZ
da_DK
de_AT
de_CH
de_DE
de_GR
de_LU
el_CY
el_GR
en_AU
en_CA
en_GB
en_IE
en_IN
en_MT
en_NZ
en_PH
en_SG
en_US
en_ZA
es_AR
es_BO
es_CL
es_CO
es_CR
es_CU
es_DO
es_EC
es_ES
es_GT
es_HN
es_MX
es_NI
es_PA
es_PE
es_PR
es_PY
es_SV
es_US
es_UY
es_VE
et_EE
fi_FI
fr_BE
fr_CA
fr_CH
fr_FR
fr_LU
ga_IE
hi_IN
hr_HR
hu_HU
in
in_ID
is_IS
it_CH
it_IT
iw
iw_IL
ja_JP
ja_JP_JP_#u-ca-japanese
ko_KR
lt_LT
lv_LV
mk_MK
ms_MY
mt_MT
nl_BE
nl_NL
no_NO
no_NO_NY
pl_PL
pt_BR
pt_PT
ro_RO
ru_RU
sk_SK
sl_SI
sq_AL
sr_BA
sr_BA_#Latn
sr_CS
sr_ME
sr_ME_#Latn
sr_RS
sr_RS_#Latn
sr__#Latn
sv_SE
th_TH
th_TH_TH_#u-nu-thai
tr_TR
uk_UA
vi_VN
zh_CN
zh_HK
zh_SG
zh_TW

@paynejd, we'll need to decide how you want to introduce these locale names into OCL (e.g., synonyms, new entries, or a combination).

bmamlin commented 3 years ago

It looks like Java locales are derived from Unicode's Common Locale Data Repository (CLDR). Their download page points to a GitHub repo with CLDR json data. Java locales appear to be derived from CLDR locale names.

A good summary of locales in Java is available in Internationalization: Understanding Locale in the Java Platform. From that article:

If you want to know what Locale objects you can create, the answer is simple: You can create any locale you'd like.

In this case, perhaps we should just focus on supporting the locales that are needed right now – Indonesian (in) and British English (en_GB) and not try to support all possible Java locales.

bmamlin commented 3 years ago

Best Current Practice 47 for language tags (BCP 47) is RFC 5646, which uses hyphens to separate language and region codes. Because Java's Locale is ancient, it uses underscores instead of hyphens. Nice summary here.

Per Wikipedia's ISO 639-1 listing, the "in" representation for Indonesian language was replaced with "id" in 1989 (ref). But this "in" designation still exists in Java (as listed above).

So, I've proposed adding "id" as a synonym for Indonesian language with type "ISO 639-1 Non-preferred". NOTE: this is the first case of non-preferred for ISO 639-1. Previously, the only "non-preferred" language types were ISO 639-2 Non-preferred.

I added an entry for British English as "en-GB" (following IETF BCP 47) and added synonyms "en-GB" and "en_GB". NOTE: this introduces two new locale name types "IETF BCP 47" and "Java Locale".

I created oclapi2 PR-9 for these proposed changes.

snyaggarwal commented 3 years ago

@bmamlin @paynejd en_GB and ind are available on QA, Demo and Staging

bmamlin commented 3 years ago

Thanks @snyaggarwal.

FYI – for Indonesian, we need the legacy name "in" as name type "ISO 639-1 Non-preferred" as well. I manually added this to staging before turning on the OpenMRS Custom Validation schema for the updated PIH dictionary.

paynejd commented 1 year ago

see #1312 #1364