Aldriana / ShadowCraft-Engine

Calculations backend for ShadowCraft, a WoW theorycraft project.
GNU Lesser General Public License v3.0
37 stars 22 forks source link

Localization/Internationalization #7

Open raconzor opened 14 years ago

raconzor commented 14 years ago

This should be done at some point. Suggested method involves gettext, but if you have a better way - let us know.

raconzor commented 14 years ago

I'll start looking into this as I have time over the weekend.

dazer commented 13 years ago

I've been working a little on this and would like to comment on it before I start compiling mo files. There are certain things we should agree on before going forward:

1 I agree on using gettext. Not that there are many options out there, but I think it's the most reasonable way to go. Besides, if you are running unix you probably have the gnu package already.

2 What are we prividing i18n for? just the output right? that, as of now, includes error messages, some variable names (those we get when we run test.py), and dps/ep figures (some languages change the way those are displayed). If that's so, I've run into some trouble: some of the errors are carrying variable names from the former input (in our case test.py). What this basically means is that I'd need to internationalize our input (which, I imagine, is not going to be test.py for ever). This is somewhat important because po files are tied to where in the code are the translatable strings and it's kind of a mess when you move those arround too much.

3 With that last point in mind I'd like to ask when are we going to build this? before or after the strings are in place. If we are to start producing the template for po files, we should know where are the strings to translate and what exactly do they look like.

4 I think we should write a style readme to accompany the po files when we distribute them to the translators: wow was first released in English and remained like that untill BC I believe (and it's translated to only a handfull of languages now). Wow players, hence, tend to build a conundrum of expresions related to the game; for instance, there's a native word for 'proc' in spanish, french, italian and german (those I know, but every languaje should have its own) but no wow player will ever use those words. I believe our implementation should leave this open to the better judging of our translators but some enforcing should be done in terms that the game UI already provides (like 'rating')

For starters I'm going to start marking the strings already in place. The format, if no one thinks of a better one will be the usual: naming the gettext method as underscore: _("string") From there on, we can keep writing them like that and construct the pom po and mo files when the product is ready to be translated (I offer myself to deploy the es_ES flavor by the way).

Aldriana commented 13 years ago

2) I think we're mostly looking at i18n for error messages. I think its fine to put the burden on the calling program to display the results in a way that makes sense to users, including i18n if necessary. That is: we don't need to translate all the ability names returned by the damage calculator - no one (outside of sample code) is going to be displaying those strings directly anyway. Rather, they'll have some UI that reports those numbers, and the strings are just keys to help them keep track of which number is which.

Input is a bit of a problem for race, but I'm inclined to ignore the problem for now. If the caller has half a brain we'll never hit that case anyway.

4) I'm not really worried about guidelines for translators. Anyone who will be translating this is reasonably wow-savvy and can thus translate by meaning rather than by text alone. For more general coding projects you have to worry about that sort of thing, but for this, all translators will be players, and fairly serious players at that unless I miss my guess.

dazer commented 13 years ago

This is not as straightforward as I expected it to be. My progress is in this branch of my fork https://github.com/dazer/ShadowCraft-Engine/tree/i18n

edit: (reverted): First, I reverted the builtin._ thingie since the python documentation suggests to call gettext from the main driver file of our application (test.py for the moment). Apparently it gets called fine from my i18n.py

I created the pot template with xgettext from the command line (if anyone runs windows, like me, you need to set the path to wherever you have the tools). The po and mo files that I commited where created with Poedit, but msgfmt outputs files looking just the same (albeit lacking plural forms, but that can be set from poedit too).

Now my trouble comes when setting the _() output to something translated: the function set_language catches the path to the .mo just fine but whatever implementation I've tried with either gettext.install/gettext.translation or gettext.bindtextdomain/gettext.textdomain always outputs the same "Hello world!" that I put in test.py in English.

So I leave it to you guys to figure what's going on because after reading some double figures of manuals, tutorials, python and gnu documentation and whatnot I've come to the conclusion that I simply don't know what's wrong.

Aldriana commented 13 years ago

Re: imports. What makes it tricky is that we're not creating an application in the conventional sense - we're creating a library. So yeah, right now we're calling everything through test.py - but that shouldn't be a requirement - in fact, that sort of by definition won't be the case. When this gets hooked up to the frontend at some point in the future, the calling code will be making its own set of function calls and not going through test.py.

Now, we could just require anyone using this library to set up i18n in their calling code; but that's sort of a pain in the ass. We can write code that will set up i18n for them (and probably should) but if that has to be called before you can even import the rest of the library, that breaks the usual flow of import what you need -> start calling functions. And even if it happens automatically on import, it still means import order matters, which offends my sensibilities.

Hence: the builtin hack was designed to make sure all code imports correctly and can be used even if you haven't gotten around to setting up i18n - which seems like a fairly sensible thing to want to be true. Its not a replacement for setting up i18n properly - it's just insurance against the people who don't.

dazer commented 13 years ago

That clears it up a lot. My understaning was that this thing could work on it's own (which it can, but needs something like our test.py) and Antiarc's work is an ui on top of it (whick it is, but I assumed it would call something like test.py: the bottleneck you were talking about that is).

I'll commit in my fork a from core import i18n as a quick hack but I still need some more spoonfeeding on the actual i18n issue

Aldriana commented 13 years ago

Well, I view test.py as sample code as much as anything. Antiarc's stuff (or any other UI draped over this) is going to need to define some functions and stuff to make the calls they want in the order they want, translate the information passed in from json (or whatever) to our objects, and so on. The less requirements we can put on what they have to write, the better. They'll probably write something that looks like test.py... but it won't be test.py per se.

julienp commented 13 years ago

Python locale on Mac OS X seems to be all sorts of stupid. It fails with File "test.py", line 92, in print ' ' * (max_len + 1), totaldps, ("total damage per second.") UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 2: ordinal not in range(128) in iTerm.app andValueError: unknown locale: UTF-8in Terminal.app (it stupidly setsLC_CTYPE=UTF-8` as a default setting)

Setting LANG=en_GB.UTF-8 LC_ALL=en_GB.UTF-8 fixes those errors, but results in test.py printing stuff in Spanish: 22728.7370707 daño total por segundo.

Not entirely sure what the correct way to set this up is, but from what I can see there's no way around manually setting LANG and/or LC_ALL.

macbook-j:~ julien$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getdefaultlocale()
(None, None)
>>> ^D
macbook-j:~ julien$ export LANG=en_GB.UTF-8
macbook-j:~ julien$ export LC_ALL=en_GB.UTF-8
macbook-j:~ julien$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getdefaultlocale()
('en_GB', 'UTF8')
>>> 
Aldriana commented 13 years ago

So, when I just ran it under MacOS 10.6.4, it ran just fine, though printed results in spanish. So I changed the line in test.py to

test_language = 'en-US'

And now it works just fine.

julienp commented 13 years ago

Could you check with printenv if you got any of LC_* or LANG set? I remember I had to export a correct locale in my ~/.profile for something python related on my previous machine, but my current setup doesn't have anything non-default set as far as I can tell.

Setting test_language to something else works for me too, but I assume that's because that essentially bypasses anything in set_language.

My understanding was that setting the language to local should automatically pick up the correct language. Do we need to somehow tell gettext that the default strings in the code match english locales?

Aldriana commented 13 years ago

I have

 LANG=en_US.UTF-8

but no LC_* set.

The defaulting to spanish is a little odd. I guess I don't quite understand what's being done in i18n.py - why are we appending the locales of the machine to the locales we support? Shouldn't we instead be comparing them and picking one on both lists? As it is, running with test_language set to "local" passes ['en-US', 'es_ES', 'es'] in for languages so I guess I'm not totally surprised that it winds up in spanish.

dazer commented 13 years ago

Let me clarify what's going on: Passing 'local', for starters will never happen: we can't figure the user's locale because we don't comunicate with them. The UI on top will figure out the user's locale checking the settings on their browser and pass it to set_language() in the shape of en_GB, fr_FR or whatever.

Then, why is it there? well, the elitistjerks thread about what to do come cata suggested this could end up not only working online: someone may try and make this work under LUA, or just make a new iDPS out of it. If that's ever the case I wanted to provide the tools to do so (in this case catching the potential offline user's locale).

The Spanish default is in place so you guys can notice something working. When we collect some other translations and the program actually catches locale under any OS the line appending supported languages goes away (and a fallback=True should be added there too so the strings come unchanged if no .mo file is found).

Which brings me to Julienp's problem. See, I think MacOS (SunOS too) comes with LANG=None as default (and win32 comes with two, one global and one for each user, go figure). That's why I'm trying to catch those through import os os.environ.get('LANGUAGE', None) That one doesn't surprise me, since LANGUAGE is a GNU extension (it basically stores various possible locales, but I put it there just in case someone did set that under maybe unix) So it seems like neither catcher is working for you, but your OS sure is set to en_GB.UTF-8 somewhere and I'd like to know where so I can catch it without the user having to set LANG. I'll be doing some research about that; could you check for LCALL? LC* would be even better.

Edit: Actually what I'd like to know is what list is passed to gettext.translation().install(). Mine takes ['es_ES', 'es_ES', 'es'] and Aldriana's ['en-US', 'es_ES', 'es'], if you manually set LANG=en_GB you should be passing ['en-GB', 'es_ES', 'es'] and get the es_ES translation (which indeed happens) since there's no 'en' whatsoever; but what does it pass along when no LANG is set?

julienp commented 13 years ago

No additional language is set, the current version that doesn't add the Spanish translations passes an empty list if LANG isn't set.

As discussed in the pull-request, I don't think requiring someone that wants to run this from the command line to export LANG or similar is a problem. Like I said before, on a previous machine I had to do the same for something python related, and AFAIK perl has the same problem on osx. I was curious if there was a way to still get the locale in python, but I can't find one and any solution I've come across boils down to setting LANG.

dazer commented 13 years ago

On packaging and translations, I may not be gettingn this straight but here I go: If I understand this correctly the setup file cannot in any way copy .mo files unless settign up an extension. Well, Windows has no homebase directory and every locale file is always inside the application/module (in fact, the whole gettext domain concept makes little sense under windows but applications are built like that anyway).

What I think could be done is either delivering an internationalization package (that is, one non python package that finds where ShadowCraft has been installed and copies .mo files there) or defaulting the gettext install in i18n.py to no locale_dir at all and make an extension in setup.py that sends the .mo files to sys.prefix/share/locale (that on my computer would be C:\Python26\share\locale). What's your take on this?

Aldriana commented 13 years ago

Is i18n in a place that we're happy with right now? Is anyone using it? Does anyone care? Should I close this issue, or is there more work we'd like to do on it at some point in the future?

dazer commented 13 years ago

Current UI does not support the feature but I think it's kind of nice to have it just in case someone builds something else from this. If anyone feels the need for a quick step-by-step guide to create and compile the pot/po/mo files I'll gladly do it; otherwise I'll simply keep updating it just because I kind of like it. So, I'm fairly happy with it as is; feel free to close.