Closed inthedark122 closed 3 years ago
TL;DR: Truly supporting i18n is much much more than renaming tr
to translate
. It has a high price tag for developers. What I fear is that somebody would be all hyped-up for translating Orange, we'd spend weeks and weeks preparing the code for it (see below), (s)he would translate some part and then disappear, leaving us with all that rubbish in the code. Translation requires a serious commitment from both sides.
Changing tr
to translate
is the smallest of the problems here.
There are only a few strings that are marked as tr
. Calling a different function wouldn't solve anything. (Related to that, @markotoplak said he'd rather have (easily) editable translations, and I, too, have a preference for gettext over Qt's .ts files.) We'd need to go through all the code and mark all strings.
The function for translation was traditionally _
in order not to decrease the code readability (and line lengths). Lately, _
is used for redundant argument or variable, hence we'd need to use something longer. tr
is OK-ish, while translate(...)
or gettext(...)
adds around 10 characters of rubbish in many many places. This is of course the required price to pay for translatable code -- but are we willing to pay it, and keep paying it?
Orange uses f-strings, except in very old code. f-strings are expressions and thus untranslatable. To translate them, we'd need to revert to using format
, which adds another burden to code readability. Moreover, f-strings are elegant and clear, format
is clumsy and mistake-prone (in comparison). At least for me, abandoning f-strings is almost beyond the red line.
There are hard-coded English-specific strings, like adding 's' for plural. Supporting translation would require those to be rewritten. (This one hurts least.)
This one is the worst: it's not something we implement and we're done, but it requires a continuous commitment -- all new strings have to be properly marked for translation etc. Otherwise translators would invest time in one version and find it impossible to translate the next. Furthermore, we occasionally change a string just for the sake of slightly better wording. If we make Orange translatable, we'd have to stop doing this.
Orange has many add-ons. Will people translate them, too? Or suffer an awful-looking mixture of original and translated text in one widget (because some parts may be inherited).
These are just the points from the top of my head.
I'd estimate points 2-5 would take a few weeks (so, probably: months) of somebody's time. For a single shot, not including maintenance.
I need to mention that we had a translatable version, with translations to Slovenian and Japanese 11 years ago. A single released version. It took me almost two months, and it was abandonded soon after.
Our core team is Slovenian. I guess the only way we can make this work is for us to have a strong interest in having a Slovenian translation and investing our time in making and maintaining it.
Closed due to inactivity.
Hi, just came here from Discord to leave a note about my attempt to touch on this topic.
Currently, I'm trying to make a (Chinese) translation on my own branch. I'm planning to merge new changes from upstream at intervals and try if my solution lasts (then we could discuss making changes in the main project or other moves).
If interested, my changes are listed in https://github.com/bigeyex/orange3 https://github.com/bigeyex/orange-canvas-core https://github.com/bigeyex/orange-widget-base
My approach
For questions raised by @janezd :
(Just some thoughts in case it will be helpful)
I believe gettext and "_" (as in the last comment) works fine for 1-3.
For question 4, there is indeed some pain, most in cases where strings are used both as text and identifiers (I had to use some workarounds in my try).
For 5, as long as people accept open-source projects to have partially translated strings, it works fine. Changing the wording may make certain text need re-translated, but it's a common practice in open source projects. See the Scratch project, which uses Transifex (and there are a list of online platforms, many are free for open-source projects) to coordinate translators around the world (this issue is harder in Scratch since its users are kids, and they indeed have a team for this).
For 6, Scratch uses a framework for add-on translation and leaves it in the add-on developer's hands. This does need some consideration, but maybe it's not the top priority for now?
I wouldn't like to discourage you: I understand the need and appreciate your enthusiasm. I'm just being cautious.
_
to code where _
is already used for gettext.I was involved in translating Scratch to Slovenian. It's incomparable. The number of messages in Scratch is very small, while in Orange it's huge. Scratch's messages are also not actively changed. Scratch doesn't get new blocks all the time. It doesn't have problems with f-strings and _
...
Regarding translation of f-strings:
>>> def _(x):
... if x == 'f"{n} instances"':
... return 'f"{n} primerov"'
... return x
...
>>> n = 12
>>> f"{n} instances"
'12 instances'
>>> _(f"{n} instances")
'12 instances'
The last line should be 12 primerov
, because it's supposed to be translated. But it's of course not because _
receives a string which is already interpolated.
@janezd You're right on 3 - I double-checked my code, gettext does not translate f-strings. There are some tricks (the best I saw is using {_('static text')} inside f-strings)but it may need some time to try or wait for some PEP. I surveyed Mu - another python project - and it doesn't use f-strings unfortunately...
I understand the need to be cautious about adding new structures to the codebase. That's why experiments and discussions might be helpful. (what I did before is merely a hobby project and I'm not pushing an agenda of making Orange translatable)
For other questions:
f-strings are expressions, so no gettext-style utility will work with them. See what "f'John is {x} years old.'
compiles into:
>>> dis(compile("f'John is {x} years old.'", "<string>", "eval"))
1 0 LOAD_CONST 0 ('John is ')
2 LOAD_NAME 0 (x)
4 FORMAT_VALUE 0
6 LOAD_CONST 1 (' years old.')
8 BUILD_STRING 3
10 RETURN_VALUE
The whole string is never "materialized", so it can't be passed to any function like _
.
I also can't imagine any PEP that would solve it. F-strings are really fast because there are no dunderscore methods (or functions like format
) involved. I doubt they'd slow down the interpolation by adding such overhead.
Django, for instance, doesn't use f-strings for this reason.
A possible approach could be introducing something like "ft-string", which allows gettext to extract the text, and allow the template string to be replaced before compiling.
I saw the gettext people trying to allow things like f"{_('foo bar')}"
to be captured when baking templates. No idea whether such PEP will be proposed (maybe there are not so many users asking for it?)
Django indeed doesn't use f-strings in translatable scenarios. Although they allow using it in other settings.
"ft-string", which allows gettext to extract the text, and allow the template string to be replaced before compiling.
How would that work? In f-strings there is no string to pass. When you say f"John is {x} years old"
, a string like "John is {x} years old."
is never constructed, it never appears in the memory. There is nothing to be sent to gettext
. Carefully read the above dissasembly again: there are actually three strings "John is "
, str(x)
and "years old."
, which Python pastes together. When the entire string is composed (and could/can be passed to gettext
), the value of x
is already inserted. Before inserting x
, there is no string. No way around this.
This is also explained in the link you sent. Read it through.
things like f"{_('foo bar')}"
It's even worse, they use nested f-strings. I suppose this was meant as a hack, not to be actually used, because it defeats the purpose of f-strings, this is more complicated than str.format
.
"ft-string", which allows gettext to extract the text, and allow the template string to be replaced before compiling.
How would that work? In f-strings there is no string to pass. When you say f"John is {x} years old", a string like "John is {x} years old." is never constructed, it never appears in the memory
I think they were referring to a pre-compilation step, akin to CSS minification. If a solution like that exists, it would apply to our use case, and might also allow for more complex translations in the vein of flipping word order.
I went down this mailing list https://mail.python.org/pipermail/python-ideas/2018-September/053441.html
One interesting thing that got brought up was PEP 501, which was deferred: https://www.python.org/dev/peps/pep-0501/
But they eventually land on the idea @bigeyex proposed, writing a preprocessor/precompilation step with something like parso https://parso.readthedocs.io/en/latest/index.html.
How about this: Have a script that runs over your code, looking for "translatable f-strings":
_(f'Hi {user}')
and replaces them with actually-translatable strings:_('Hi %s') % (user,)
_('Hi {user}').format(user=user)
https://mail.python.org/pipermail/python-ideas/2018-September/053552.html
Also, to reduce clutter, could we apply translations in the orangewidget.gui modules? Most strings/names go through there, don't they?
I haven't understood @bigeyex this way. And the example from the mailing list is vague about how to find such strings. Regular expressions?!
I was thinking about the last moment in which f-strings are still whole, which led me to towards a similar idea for translation, but it's one that doesn't require any changes in Orange's current code.
STRING
tokens and saves them into a .pot file. Save this file as create_pot.py
, run it and see create_pot-trans.py
.
import tokenize
# Just some random stuff with strings, which need to be "translated"
print("This is a string")
x = 42
if x > 1:
print(f"And there could be {x} more.")
print("And this one is {}.".format("formatted"))
# This part of the script is an equivalent of xgettext
fname = "create_pot.py"
with open("messages.pot", "wt") as pot, open(fname, "rb") as source:
for token in tokenize.tokenize(source.readline):
if token.type == tokenize.STRING:
msg = token.string
if msg[0] == "f":
msg = msg[1:]
pot.write(f"""
#: {fname}:{token.start[0]}
msgid {msg}
msgstr ""
""")
# This part demonstrates translation of source file
tokens = []
with open(fname, "rb") as source:
for token in tokenize.tokenize(source.readline):
if token.type == tokenize.STRING:
msg = token.string
fstring = msg[0] == "f"
if msg[0] == "f":
msg = msg[fstring:]
# For a demo, it converts all strings with more than 15 chars into upper case.
# In practice, it would read translations from .mo or .po file
if len(msg) >= 15:
token = token._replace(string="f" * fstring + msg.upper())
tokens.append(token)
with open(fname[:-3] + "-trans.py", "wb") as trans:
trans.write(tokenize.untokenize(tokens))
Note that this does not require any _
or anything. No changes in code.
However, gettext
has mechanisms for handling plural forms. I can think of a solution here, too, but it's ... somewhat ugly.
Also, to reduce clutter, could we apply translations in the orangewidget.gui modules? Most strings/names go through there, don't they?
Some, but very far from all. Aleš actually avoids gui
, and has his very sensible reasons to.
Great stuff @janezd. We've had numerous people reach out to us about translations, if we can get something like this going I think this is something the extended community could really contribute to.
Two thoughts:
I've no experience with gettext
, but I looked up their plural handling mechanisms, looks cool and intricate. Could this type of solution be connected with the internals of gettext
to use their translation/plural handling system? Or maybe could this be written as an extension of gettext
?
Could we write this as an import hook, making it a completely on-the-fly thing? If so, using a different language in Orange would definitely require a restart, which I think is perfectly fine from a UX standpoint.
This changes sources and can only help someone prepare an installation in another language. Changing sources in place and forcing python to recompile them would be a very bad idea.
This solution has nothing to do with gettext, except for using its file format for storing the messages.
Think again what this solution does: it changes hard-coded messages, while gettext translates them on the fly. Gettext can adapt to plural forms (the mechanism is not very intricate, it's quite trivial), while this solution obviously can't, without adding some if's (where I would hesitate to go).
I posted this as an example of how somebody could translate Orange and relatively easily maintain the translation without core developers being concerned or involved. No import hooks or similar tricks that are bound to cause us headaches.
I investigated code base of orange3 and PyQT5 and found that is not possible to make custom localization files We can use pylupdate5 to generate
.ts
files for making localization files. I made them, but PyQt5 don't support those files with.tr
function. This is limitation of the scope of classes. You can read more in the documentation https://www.riverbankcomputing.com/static/Docs/PyQt5/i18n.htmlTo add custom translation files we can:
.ts
file by running command:pylupdate5 /path/to//lib/python3.9/site-packages/orangecanvas/application/canvasmain.py -ts orange.<lang>.ts
orange.<lang>.ts
in the GUI application. You can run GUI application with commandqt5-tools linguist
What's your use case?
What's your proposed solution?
.tr
methods into.transpate