Translation loading API improved

Beep6581 commented 9 years ago

Originally reported on Google Code with ID 217

I have modified the translation framework to use a hierarchy with overridden translation
strings.  Now, when you add new translation terms, you should only add them (in English)
to 'rtdata/languages/default'.  Translators can then add them (in their translated
form) to their own language / locale files.  (A Language file is just the name of the
language, e.g. English, French, etc.  A Locale file is the name of the language plus
the locale in brackets, e.g. English (US), English (CA), French (FR), etc.  In the
code, I assume that there is a single space separating the two parts.

This is probably easiest to explain through an example.  Assume I have just added key
'TEST_KEY;This is a test' to the default translation.  I then add translated versions
'TEST_KEY;This WAS a test' to 'English', and 'TEST_KEY;This WAS a test, eh!' to English
(CA).  No other translations are updated.

If I am running RT in English, I will see the string "This WAS a test".  

If I am running RT in English (CA), I will see the string "This WAS a test, eh!".

If I am running RT in English (UK), I will see the string "This WAS a test" (since
this string is not defined in the UK locale, it defaults to the next highest, which
happens to be English).

If I am running RT in French (FR), I will see the string "This is a test".  (Since
this string is not defined in French (FR), it defaults to the next highest, French;
it is not defined there either, so it defaults to the next highest, 'default', where
it *is* defined.)

Finally, I have changed the behaviour when a string is not found in the lookup table;
instead of returning "" it returns the key name.  This will help developers to see
which strings are not translated.

I have removed the untranslated items in existing languages; going forward, we should
only add properly translated items to a given language file.

I am going to leave this ticket open for a little bit to allow for comments; if nobody
objects in a while, though, I will close it.

Cheers

Reported by wyatt.olson on 2010-09-12 02:59:16

Beep6581 commented 9 years ago

BTW, for those familiar with Java, this is similar to how resource bundles work.

Reported by wyatt.olson on 2010-09-12 03:02:09

Beep6581 commented 9 years ago

There is a comment on issue 49 that rawtherapee fails to install. Could you have a look
at that please?

Reported by rinni@gmx.net on 2010-09-12 10:52:00

Beep6581 commented 9 years ago

Give this another shot; Although it was not needed, for backwards compatibility with
peoples existing options files, I re-added an empty English (US) file...

Also please note that you will need to delete your CMakeCache.txt and re-run CMake,
as there is another language file which needs to be added to the installation (default).

Cheers

Reported by wyatt.olson on 2010-09-12 13:30:15

Beep6581 commented 9 years ago

Yes! After rerunning cmake it works!

Reported by iliaworld@yandex.ru on 2010-09-13 18:33:27

Beep6581 commented 9 years ago

One question : why does English(UK) contains strings since it's almost the same content
as 'default' ? When i update the French catalog, i must compare to 'default', right
?

Btw, 'default' contains 2 times MAIN_BUTTON_FULLSCREEN and MAIN_BUTTON_UNFULLSCREEN.

A last thing : for maintainability, it would be nice to let the strings in a fixed
order. Actually, developers put their new strings to the end of the file, but are moved
in a more logical place in the list one day or another, so file comparison when updating
become a little bit more complicated. Developers should be invited to place them directly
to the good place (i.e. alphabetical order ?) in the list.

Reported by natureh.510 on 2010-09-14 20:05:01

Beep6581 commented 9 years ago

English (UK) *will* eventually have only the differences from default.  I have not had
time to go through it yet and sort out what is really changed.  Hopefully I will have
time today or tomorrow.

As for default, I just renamed English (US) to default.  I guess I will run it through
a sort / uniq to get rid of duplicates.

And I agree that we should get the translations to be in alphabetical order.  In fact,
given the architecture at this time, I think that I can probably just go ahead and
sort all the translations; that will make things ready for the compare editors for
translators.  If nobody has any objections, I will do that today.

Cheers

Reported by wyatt.olson on 2010-09-14 20:14:48

Beep6581 commented 9 years ago

I imagine there could be a  little script that enforces a fixed ordering principle (alphabetical
sounds good to me).  The locale translator would simply need to run it before committing
their translation.

Reported by ejm.60657 on 2010-09-14 20:24:32

Beep6581 commented 9 years ago

Well for those on *nix, "cat <file> | sort | uniq > foo; mv foo <file>" works.  Don't
know about Windows users; depending on how many translations are coming in from Windows
users, I could probably just run everything through sort on a semi-regular basis. 
I suppose another option is to just re-implement sort (and possibly uniq) in .bat commands,
but that would be pretty ugly.

Reported by wyatt.olson on 2010-09-14 20:30:39

Beep6581 commented 9 years ago

Hello, perhaps I miss something. I did a 'rm CMakeCache.txt', 'make clean' and a new
cmake. But my Nederlands language file (dutch) is exactly the same as the last one
I committed, couple of months ago. I don't see new entries as can be seen in the default
file. So at this point I need to check all those ~700 lines for unique new entries,
or what? 

As I said, I guess I miss something...

Reported by paul.matthijsse4 on 2010-09-14 21:32:00

Beep6581 commented 9 years ago

OK I have pushed a change (which includes Hombre's updated French translation) which
does the following:

1) Sorted and uniq'd all lines on all files to avoid duplicates
2) Removed unused translation lines in English (UK) (now there are only lines with
'colour' vs 'color').
3) Added a LICENSE and README file so that we can avoid having comments in each file
(which can mess up automated sorting).
4) Added Serbian translations from a third party (issue #214).

I have tried this out myself and it looks good to me.  Please let me know if there
are any more translation-related bugs.

Please note that you will need to delete your cache and re-run the cmake command, as
there are changes to the included files.

Cheers

Reported by wyatt.olson on 2010-09-14 21:33:11

Beep6581 commented 9 years ago

Paul,

Up until a couple minutes ago there were no changes to the translation files themselves,
only to the loading API.

As of my last push, I have changed all the files (sorted, removed non-unique terms).
 In theory everything should be good to go now, and ready for ongoing maintenance.
 The only lines which should be deleted from existing translations now are non-translated
English terms (I think I got all of them, but may have missed a few if they were not
at the end).

The newly-created README file in the languages directory should give some information
on how this is implemented.  If you (or anyone) finds that confusing please let me
know; the concept is quite simple, but I think I may be making it more difficult than
it really is :-)

Let me know of any questions.

Cheers

Reported by wyatt.olson on 2010-09-14 21:52:03

Beep6581 commented 9 years ago

Hello,
Today I had time to look at this. Two remarks.

One. You (Wyatt) deleted the strings in the language files that contain the info/names
of the GUI translators of previous and current versions of RT. I think it is better
to put those lines/names in a seperate file called Credits or something. Credits go
to whom deserves them, isn't it? Even in OS projects :-)

Two. Okay, so I compared the new default file (formerly called English (US)) with Nederlands
(dutch). The new strings in default do not show up in Nederlands. This implies I need
to compare line-by-line the two files, which is not only a lot of work but also rather
error prone. All translators of all languages need to do this. Not a good idea.

So I first looked at diff, uniq, awk and some more tools to solve this (I'm not a specialist
here). But in the end I used a modified version of Hombre's bash script that he posted
in issue 49 here on Googlecode ("hard coded language strings", comment 74) and that
did the first part of the trick: it showed me which strings were present in default
and not in Nederlands. 

What we need is to modify this script, so that it adds those new lines in default to
all the language files under a section ### NEW at the end. This way translators know
exactly what to do, without having to compare the two files line by line. 

Perhaps Hombre can do this without too many hassles, as his original script is already
doing 3/4++ of the work? I Hope so! 

Regards, Paul.

Reported by paul.matthijsse4 on 2010-09-15 21:32:43

Beep6581 commented 9 years ago

1) Hmm, you are right about the info of previous translators.  When converting, I stripped
comments as the sorting procedure made for all sorts of problems (the comments were
sorted as well, which for multiline comments obviously mangled the order).  I suppose
we can have a convention of numbering comment lines:

#001 This is a comment
#002 Another comment
#003 Comes after...
...
etc

If you feel this is an acceptable solution, I can replace the deleted entries from
Mercurial in this format.

2) As for the translation option, my first inclination was to write a script for this;
it should not be too difficult (I have not looked at Hombre's script, but as you say
it is probably almost there already).  The potential issue with a bash script, though,
is that it is *nix-specific.  I suggested Python or Java to do this, but both require
interpreters / VMs which are not common on Windows.  I suppose I could just write a
bash script, and let Windows users find a way which works for them, but that seems
a bit rude :-)

Reported by wyatt.olson on 2010-09-15 21:44:06

Beep6581 commented 9 years ago

Hello, yes the #00x etc. option would be fine. Just don't throw the names of all translators
thus far in the trash. Or put them in a Credit file, that will do as well. 

2. No, users/translators must not do anything here - by definition! The language files
should be offered with a "### New" section at the end so that translators see instantly
the new strings to be translated. This means Hombres's script, better said a slightly
modified version of it, should be run on the server side every time a new build is
available (or just one time in two weeks to keep it simple). As far as I can oversee
this, this is really very easy and avoids lots of work for translators. 

Regards, Paul.

Reported by paul.matthijsse4 on 2010-09-15 22:22:25

Beep6581 commented 9 years ago

For #2, what would you say about having new options put in comments, i.e.:

#WAVELET_EQUALIZER;
#WAVELET_EQ_COARSE;
ALREADY_TRANSLATED_STRING1;Foo
ALREADY_TRANSLATED_STRING2;Bar
ALREADY_TRANSLATED_STRING3;Baz

Should be easy enough to see, and then sort would put it at the top.

I would make a bash script which can easily be run by anyone on Unix to refresh the
languages.  Developers who are on Unix could run that themselves when they add their
own stuff to 'default', or if they are on Windows they could request that someone else
do it for them (it would be a 30 second operation).

I'll give this a shot tonight if I have time, should be easy to do and would help translators
a lot.  By doing it this way, I think that putting a separate CREDITS file in the directory
to track names would be better.

Reported by wyatt.olson on 2010-09-15 22:28:15

Beep6581 commented 9 years ago

Hello, you're way better informed on this multi-platform stuff than I am, to start with.
Developers can add a .diff file with their work to include their new strings to the
default language file, isn't it? But this doesn't work on Windows?? If so, I would
be surprised... 

#WAVELET_EQUALIZER; - yep, good idea to do it this way, preferably at the end of the
file (in section ### NEW ?).

Reported by paul.matthijsse4 on 2010-09-15 22:52:40

Beep6581 commented 9 years ago

Any ideas how to exclude a file from a glob in CMake?  I don't want to include the generateDiff.sh
script in the build, but currently it will be.

CMake code currently is:

file (GLOB LANGUAGEFILES "languages/*")

I imagine I just need to remove strings which I don't want to include, but I have no
idea how to do this, and Google doesn't seem to help (perhaps I am using the wrong
keywords...)

Reported by wyatt.olson on 2010-09-16 00:05:00

Beep6581 commented 9 years ago

Pushed changes; let me know if you think this looks good to you.

The generateDiffs.sh file is quite inefficient (O(n^2) if my algorithm theory holds
true), and takes a few minutes to run, but I think it is usable enough... if someone
wants to improve it, please feel free!

Reported by wyatt.olson on 2010-09-16 00:30:10

Beep6581 commented 9 years ago

Wyatt: Just put in in 'tools' and adjust the path. I guess that's the easiest way.

Reported by rinni@gmx.net on 2010-09-16 11:02:07

Beep6581 commented 9 years ago

Good call, I'll do that.

Reported by wyatt.olson on 2010-09-16 13:39:25

Beep6581 commented 9 years ago

OK, everything is cleaned up, and I verified the script is still working.  Translators:
please take a look at the format in the new files, and see if this looks like it will
work for your needs.

Reported by wyatt.olson on 2010-09-16 14:12:19

Beep6581 commented 9 years ago

Hello, seems to work very well! Am I correct that the language files come already with
the New section at the end, or was this done during compiling? 

Thanks for this, translating is way easier this way.

Reported by paul.matthijsse4 on 2010-09-16 15:43:14

Beep6581 commented 9 years ago

The New section at the end is generated from the ./tools/generateTranslationDiffs.sh
file, and are checked into Mercurial.  It does not happen during the cmake / make compilation.
 The generate diffs script can be ran at any time; it will sort existing translated
keys, remove the ! comments, and re-add the new section based on keys in the current
default file.  It takes a while to run (I basically do brute-force when generating;
for each translation file, I loop through each line in default, and see if the key
exists in the given translation file; if not, I add it to the end).  Algorithm speed
notwithstanding, though, it seems to work very well.

If you are running on *nix, feel free to mess with the translation files, and re-run
the generate diffs script to see exactly what it does.

Cheers

Reported by wyatt.olson on 2010-09-16 16:15:04

Beep6581 commented 9 years ago

Given that a) this has been in the code for over a month, b) there have been no problems
reported (to my knowledge), and c) there have been multiple translations updated by
community members (so it appears to be easy enough to do), I think that this issue
is resolved, and can be closed.  Feel free to comment if anyone objects.

Cheers

Reported by wyatt.olson on 2010-10-20 00:59:23

Status changed: Fixed

Beep6581 / RawTherapee

Translation loading API improved #208