benoitfragit / google2ubuntu

1.1.5 publcation Nedd Translation
https://plus.google.com/u/0/communities/103854623082229435165
133 stars 33 forks source link

Locale handling is wrong #20

Closed ladios closed 10 years ago

ladios commented 10 years ago

You can't just cut the region code off and then replace it with upper cased language code.

For Chinese, there are Simplified Chinese - zh_CN and Traditional Chinese - zh_TW. There is no locale such as zh or zh_ZH. For Chinese in Hong Kong - zh_HK, the case is the worst. Normally we will set zh_TW as fallback written language. I am not familiar with Python's locale library, but you can get a list of fallback languages from the $LANGUAGE environment variable. And there are spoken languages. Most Hong Kong population speak Cantonese, for that we should pass zh_YUE to Google Speech Input/TTS, for Mandarin either of zh_HK, zh_CN or zh_TW is okay.

I found a list of supported language codes here: http://stackoverflow.com/questions/14257598/what-are-language-codes-for-voice-recognition-languages-in-chromes-implementati

Recap:

benoitfragit commented 10 years ago

The first thing I can do is to remove such thing fr_FR they are not needed if I change the name of some folder in config/ from fr_FR to fr.

benoitfragit commented 10 years ago

Ok I've change the structure of config file to use such type of language ab. For the other part of the problem, It seem that when we will add such language we will have to load different locale with gettext one for the writting language and an other one for the speaking language. I don't know how to do it

ladios commented 10 years ago

Your LocaleHelper is still splitting the locale, please do so only when ab's translation presents but ab_CD doesn't. The LANGUAGE env contains a : separated list of locales. Please use that list to determine which LANG to use for i18n/gettext.

Let us have a file like config/LANG/extlang that contains a short list of available ISO-639-3 codes for user to choose as spoken language for that LANG. That's a three-letter code, let's call it EXT. If EXT presents, change LANG to ab_EXT_CD or ab_EXT before passing it to Google API. (Locale's language tag order should be as per BCP 47 http://tools.ietf.org/rfc/bcp/bcp47.txt but Google API doesn't seem to care about alphabet case and whether it uses - or _ as separator.)

We might want config/LANG/EXT.xml or config/LANG/EXT/default.xml as well, but it's not really necessary.

benoitfragit commented 10 years ago

The next version will not cut the language code ie for english we will have en_EN. Everything is already on github. Someone ask me to find a way to choose an other language than the system locale so I can't fixe locale by using system locale. I need to check the user choice like this:

So, in order to separate written locale and spoken locale what I suggere (in order to not modify a lot the program) for example for chinese create i18n/zh_ZH/ the structure of this folder will be

Then when I load the locale in add_windw I will check if extlang exists If true add a submenu that let the user choose his spoken language: zh_ZH will be stored in the locale.conf file and the ext in an other file like extlang.conf. This file will be read for tts and stt.

Note: For the moment I can't load two different languages

ladios commented 10 years ago

Sorry if you already working on extlang, but please forget about extlang. I did some experiments and I think the best way would be like this:

Files:

Implementation:

  1. Don't show locales that just have a fallback. E.g. Don't show zh_HK in locale list
  2. Auto select fallback for that locale. E.g. If system locale is zh_HK, then read i18n/zh_HK/fallback and use its content - yue_Hant as locale

All 3 of zh_Hans, zh_Hant and yue_Hant can be used for:

  1. icu.Locale.getDisplayName(icu.Locale, icu.Locale) for showing locales' native names in locale list. (Requires python-pyicu package. Btw, icu doesn't support extlang sub tag, so yue_Hant instead of zh_yue_Hant) E.g.: yue_Hant will be displayed as "Cantonese (Traditional Chinese)" in Traditional Chinese.
  2. Google Speech API. E.g. When lang is yue_Hant, it recognize Cantonese and return Traditional Chinese text.
  3. Google TTS API. E.g. If text matched, with yue_Hant it speaks Cantonese back to user.

Would this be easier?

ladios commented 10 years ago

If you decide to use icu to display locale names, you should change all en_EN to en_US or just en, because EN is not an ISO 3166 code for any region.