CMUBigLab / webanywhere

Accessible Technology Anywhere
http://webanywhere.cs.washington.edu/beta/
Other
17 stars 7 forks source link

mutil-language framework proposal #26

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi Jeff,

刘学佳 (a designer in eGuideDog team) and I have worked out a
mutil-language framework proposal for WA. Could you please have a review?
The changes will cover most of the files! And we need to wrap string with
gettext() in future. I would like to have you comments before I start to
implement it. Thank you!

1) Add follwoing line to config.php
  $default_locale = 'en_EN.UTF-8';

2) Add following lines to browser.php
  include('locale.php');
  $locale = $_REQUEST['locale'];
  if (empty($locale)) {
    $locale = $default_locale;
  }
  init_locale($locale); // defined in locale.php

3) Add locale.php to setup locales.
  For PHP files, it will set locale and use gettext framework.
  gettext module is needed to install first: `apt-get install php-gettext`
  Here is an example how to use gettext in PHP:  
    setlocale(LC_ALL, 'zh_CN.UTF-8');
    bindtextdomain("WebAnywhere", 'locale');
    textdomain("WebAnywhere");
    echo gettext("hello world");

  For Javascript files, a gettext function written in JS will be generated
by a PHP script, which is only needed to run once. The translation is
defined in *.po file in the same file as PHP po files. Different language
has different gettext version and will be stored in different files such as
en.js, de.js, zh_CN.js. Here is an example of zh_CN.js:
    var wa_jstext = new Array();
    wa_jstext["Go"]="浏览";
    function gettext(text)
    {
      var result = wa_jstext[text];
      if (result) {
        return result;
      } else {
        return text;
      }
    }

  Here is an example how to use gettext in Javascript:
    <script type="text/javascript" src="<?php echo get_js_locale_path();
?>"></script>
    <script type="text/javascript">alert(gettext("hello world"));</script>

4) All strings in PHP and Javascript files will be wrapped with
gettext("string")

5) Use msginit/xgettext to generate po template

6) Translate strings in po files manually

7) Use msgfmt to generate mo file from po file

8) Write a PHP script to build en.js, de.js, zh_CN.js ... from mo files

9) Everything is encoded in UTF-8

Reference:
1. GNU gettext manual. http://www.gnu.org/software/gettext/manual/gettext.html
2. PHP gettext API. http://ca3.php.net/manual/en/function.gettext.php
3. Codes for the Representation of Names of Languages.
http://www.loc.gov/standards/iso639-2/php/code_list.php

Original issue reported on code.google.com by hgn...@gmail.com on 29 Mar 2009 at 1:59

GoogleCodeExporter commented 9 years ago
I think your proposal looks great, and should be a flexible way for us to add 
new 
localizations.

Please go ahead with your proposal.

As this touches a lot of (most) files, we should think carefully about how you 
should 
develop this so that your intermediate changes don't prevent the code in trunk 
from 
working. One option is to create a separate branch for you to work on.  
Alternatively, if you think you can integrate changes without breaking trunk, 
we can 
try it that way.

Once you get this, I have a number of volunteers lined up to translate 
WebAnywhere to 
new languages - at least Hindi and Bangali.

Thanks for all your work on this.  I can tell you by the emails I frequently 
receive 
from people across the world, that so many people across the world appreciate 
it.

Let me know if there's anything I can do to support your efforts.

Original comment by jeffrey....@gmail.com on 3 Apr 2009 at 7:57

GoogleCodeExporter commented 9 years ago
Thanks Jeff.

I've added a new branch 04062009-locale. And I will implement the proposal in 
the
following weeks.

-Cameron

Original comment by hgn...@gmail.com on 6 Apr 2009 at 9:43

GoogleCodeExporter commented 9 years ago
An issue of setlocale in PHP need to be paid attention.
Reference: http://cn.php.net/manual/en/function.setlocale.php

Warning

The locale information is maintained per process, not per thread. If you are 
running
PHP on a multithreaded server api like IIS or Apache on Windows you may 
experience
sudden changes of locale settings while a script is running although the script
itself never called setlocale() itself. This happens due to other scripts 
running in
different threads of the same process at the same time changing the processwide
locale using setlocale().

 de ronino at kde (reverse it)
25-Jun-2008 05:43
I experienced the behavior stated in the above Warning box: Running PHP5 on a
multithreaded Apache made the current locale change sometimes all of a sudden 
within
a script, so strftime() output wasn't in the required format.

I recompiled Apache with the prefork MPM and now it works like a charm. Took me 
a
long time to find out the reason as I overlooked the warning box searching for 
either
a bug report or a programming error of mine...

Original comment by hgn...@gmail.com on 1 May 2009 at 8:17

GoogleCodeExporter commented 9 years ago
Always check return result of setlocale.

By default, only a few locales is supported on system. To list available 
locales on
system: `locale -a`

To add a new locale: `sudo locale-gen zh_TW.UTF-8`

I don't know how to do it in Windows yet...

Original comment by hgn...@gmail.com on 1 May 2009 at 9:39

GoogleCodeExporter commented 9 years ago
I've finished the implementation of locale module. Could someone help to 
checkout 
from branches/04062009-locale and varify it?

Currently, I've implemented Simplified Chinese (zh_CN) and Traditional Chinese 
(zh_TW). To test these two locales on Debian based Linux, we need to `sudo 
local-gen 
zh_CN.UTF-8` and `sudo local-gen zh_TW.UTF-8`. I have no idea how to add a new 
locale support on Windows yet...

Then we can change locale through `http://localhost/wa/index.php?locale=zh_CN`.

To show a selection list on the browser frame, we can add 
`$show_locale_selection = 
1` to config.php.

WA will look up locale in following sequence:
1. `$fixed_locale = 'en_US'` if exists in config.php
2. locale argument in URL (this will add to cookie automatically)
3. cookie
4. HTTP_ACCEPT_LANGUAGE
This behavior is coded in locale.php

To add a new locale or update existing locale files:
1. Create a new directory tree for new locale under locale
2. Run `update_locale.php` to update po files.
3. Edit po file to do the translation.
4. Run `update_locale.php` again to generate mo files and update Javascript 
locale 
files.
5. Add a new selection entry in browser.php

Please let me know if the code has been verified and I will merge it to main 
stream. 
Thanks!

Cameron

Original comment by hgn...@gmail.com on 3 May 2009 at 3:50

GoogleCodeExporter commented 9 years ago
Cameron,

This is great that you have it working! What if someone does not have 
privileges to
become super user on a machine? For example, I don't think we have that level of
authority for the machines running WebAnywhere.

Is there another way to incorporate this functionality into an installation?

--w

Original comment by chisholm...@gmail.com on 6 May 2009 at 8:37

GoogleCodeExporter commented 9 years ago
This is a good question. I think I may need to hack the gettext to make it not 
rely 
on system's locale installation.

Original comment by hgn...@gmail.com on 7 May 2009 at 2:25

GoogleCodeExporter commented 9 years ago
I may have made a quite serious mistake. gettext is designed to be used in 
client 
side to let user use their own locale. But I used it on server side to provide 
all 
locales. I will try to hack the gettext function.

Original comment by hgn...@gmail.com on 7 May 2009 at 2:30

GoogleCodeExporter commented 9 years ago
After adding 'putenv("LANG=$locale")', Windows platform is OK now. No need to 
install locale like Linux platform.

Cameron

Original comment by hgn...@gmail.com on 8 May 2009 at 4:06

GoogleCodeExporter commented 9 years ago
I have no more improvement to gettext on Linux. I think hacking gettext to 
analyze 
mo file directly without setlocale may be time wasting and do harm to the 
potability.

In my opinion, we'd better find another machine that we can install locales. We 
can 
also leave the multi-locale function there. But I would like someone to help to 
check whether new code break any existing functionality. I would like to merge 
the 
branch to main stram. Then I can start a new task. Thanks!

Cameron

Original comment by hgn...@gmail.com on 11 May 2009 at 4:07

GoogleCodeExporter commented 9 years ago
Hi Cameron,

Taking a step back, is there a compelling reason to use the existing setlocale 
and 
gettext?  Couldn't we just write our own, which would be simpler and much 
easier to 
install?

It seems like setlocale just sets a global variable to the current locale, and 
then 
gettext uses that to decide which of several files to use to return the string 
with 
the correct localization.

Could we just do this ourselves?

Original comment by jeffrey....@gmail.com on 11 May 2009 at 3:35

GoogleCodeExporter commented 9 years ago
Localization could be more complicate than string translation. We may need to 
handle 
numbers and time. We'd better use an existing library rather than build it 
ourselves.

If we want to use gettext without setlocale, we may need to do some hacking. I 
am 
not sure whether it's easy to do that (probably not).

I will spend some time on investigating some other PHP software. Hope that will 
bring me some new idea.

Original comment by hgn...@gmail.com on 13 May 2009 at 3:58

GoogleCodeExporter commented 9 years ago
Hi Jeff/Wendy,

I have checked some PHP software (mediawiki and dokuwiki). They both have their 
own 
localization mechanism like below:

in MessageEn.php:
$mesg = array (
'tog-underline'               => 'Underline links:',
...
);

In MessageDe.php
$mesg = array (
'tog-underline'               => 'Links unterstreichen:',
...
);

This mechanism is too primitive and not easy to maintain. I prefer to use 
gettext 
mechanism. At least I would like to use xgettext to generate/update the 
translation 
template. If we add some new string to WA, we just need to run the script 
update_locale.pl. We don't need to add it to all localization files mannually. 
And 
language that has not been translated in time will just output the English 
string. 
Nothing will be broken.

To work around "setlocale" I can implement another "gettext" function. But we 
need 
to redefine the existing one. Redefining function is not allowed in PHP core 
but is 
available in runkit module through "runkit_function_redefine".

Do you think installing runkit PHP module is an immoderate request?

If you still don't like to install another PHP module, I can replace all 
"gettext" 
with "wa_gettext". This should be the final solution.

Any comments? Thanks!

- Cameron

Original comment by hgn...@gmail.com on 13 May 2009 at 9:25

GoogleCodeExporter commented 9 years ago
Cameron,

My understanding of what you are trying to do is localize the WebAnywhere 
interface. But, the majority of the 
interface is written in JavaScript. Therefore, how does rewriting gettext help 
the WebAnywhere interface? Is 
there another way to localize the interface? What are the issues with creating 
a mapping from English 
constants to other languages? If a mapping won't work because of numbers and 
dates, won't the JavaScript 
functions need to be rewritten so that the interface works on the client? 

Thank you,
--wendy

Original comment by chisholm...@gmail.com on 18 May 2009 at 7:40

GoogleCodeExporter commented 9 years ago
Wendy,

In fact, We are walking far and far from gettext now. What I am reusing from 
gettext 
framwork is only the "po" locale file format. Command "xgettext" can help to 
update 
locale files easily.

You can regard what I am doing is just creating a mapping from English strings. 
All 
strings that output to used should be wrapped by wa_gettext (both in PHP script 
and 
Javascript).

If needed, we can extend the interface of wa_gettext in future. Something like 
this:
wa_gettext("There are %d lists.", 5);

Cameron Wong

Original comment by hgn...@gmail.com on 19 May 2009 at 3:44

GoogleCodeExporter commented 9 years ago
Hi Cameron,

What ever happened with this?  It sounded from your email in a different thread 
that 
you were waiting for an agreement to proceed, whereas I thought you were going 
ahead 
with this approach.

Please let me know if there's anything preventing progress.

Thanks,
Jeff

Original comment by jeffrey....@gmail.com on 10 Aug 2009 at 2:48

GoogleCodeExporter commented 9 years ago
Hi Jeff/Wendy,

I have implemented the gettext solution I posted at the top of this thread. 
However, 
it relies on available locales existing on server system. This is regraded as a 
big 
drawback for installation.

Now the solution I propose become that we use "po" locale file format and 
xgettext 
utility, which is help to generate "po" files. Then we write our own wa_gettext 
in 
PHP and Javascript, which don't need to install extra system locale.

I need you to understand the solution and make comments. I don't want you to 
ask 
question something like "why do we need po files?" after I have finished 
implementation. If you have doubts, please raise it as early as possible. Thank 
you!

Cameron

Original comment by hgn...@gmail.com on 11 Aug 2009 at 5:02

GoogleCodeExporter commented 9 years ago
Cameron,

I like this solution.

Is there existing code that does what you mention with the wa_gettext function 
in
either in PHP or Javascript?

-Jeff

Original comment by jeffrey....@gmail.com on 11 Aug 2009 at 3:08

GoogleCodeExporter commented 9 years ago
I've submit the locale implementation to trunk. But it's just a framework. We 
need to
wrap text in PHP/JS with wa_gettext(), which should be easy to do it. I found 
there
is new submission in the trunk recently and the latest WA works not so well 
now. I
think we need to fix the bugs first.

I have one suggestion. We can deploy an eSpeak TTS server and add a 
locale/language
selection in the demo page to let users change locale/language. This should 
make WA
more widely used.

By the way, is browser.php still in used? It seems that it has been merged in
index.php. If browser.php is no longer used, we'd better delete it in case of 
confusion.

Updated file list:
Sending        trunk/config.php
Sending        trunk/index.php
Adding         trunk/locale
Adding         trunk/locale/README
Adding         trunk/locale/zh_CN
Adding         trunk/locale/zh_CN/LC_MESSAGES
Adding         trunk/locale/zh_CN/LC_MESSAGES/WebAnywhere.js
Adding         trunk/locale/zh_CN/LC_MESSAGES/WebAnywhere.php
Adding         trunk/locale/zh_CN/LC_MESSAGES/WebAnywhere.po
Adding         trunk/locale/zh_TW
Adding         trunk/locale/zh_TW/LC_MESSAGES
Adding         trunk/locale/zh_TW/LC_MESSAGES/WebAnywhere.js
Adding         trunk/locale/zh_TW/LC_MESSAGES/WebAnywhere.php
Adding         trunk/locale/zh_TW/LC_MESSAGES/WebAnywhere.po
Adding         trunk/locale.php
Adding         trunk/update_locale.pl

===== from locale/README =====

Procedures to add new locale:
1. Find the proper symbol for your locale. "de" is for German for example. We 
may
have more than one locale in a language. For example, we use zh_CN for 
simplified
Chinese.
2. Change working directory to root of WA source code.
3. Add new directories. Take German for example: `mkdir -p 
locale/de/LC_MESSAGES`
4. Create an empty locale file. `touch locale/de/LC_MESSAGES/WebAnywhere.po`
5. Update locale template with `perl update_locale.pl`. Texts need to translage 
will
be added to locale/de/LC_MESSAGES/WebAnywhere.po
6. Edit locale/de/LC_MESSAGES/WebAnywhere.po to translate the texts.
7. Run `perl update_locale.pl` to update *.php and *.js under locale directory
8. Change the locale you want to use in config.php. Add one line like this:
   $fixed_locale = 'de';

Procedures to update existing locale:
1. Wrap new string in PHP/JS files with wa_gettext('new text')
2. Change working directory to root of WA source code and run `perl
update_locale.pl`. All files under locale directory will be updated. A new 
entry for
'new text' will be added to *.po files.
3. Edit *.po files to translate 'new text'.
4. Redo step 2. *.php and *.js files will be updated according to updated *.po 
files.

Cameron

Original comment by hgn...@gmail.com on 10 Oct 2009 at 12:41

GoogleCodeExporter commented 9 years ago

Original comment by hgn...@gmail.com on 2 Jan 2010 at 7:28