computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

Ratel/Rainbow: Language maps not taken into account #431

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a new project in Rainbow. Source language e.g. de-de (or DE-DE), 
target language e.g. EN-GB (or en-gb)
2. Go to "translation kit creation" and select the segmentation step
3. Define a language map in Ratel using the regexp "DE.*"

What is the expected output? What do you see instead?
I would expect Ratel to segment my source file, but it doesn't. However it 
works when I use the regexp "de.*" (small letters). It does not have any 
influence wheter I select "DE-DE" or "de-de" in Rainbow. Seems to be a problem 
with the case sensitiveness.

What version of the product are you using? On what operating system?
Ratel: 0.23
Rainbow: 0.23
Windows 7

Please provide any additional information below.

Original issue reported on code.google.com by m...@sebastianebert.com on 13 Jan 2015 at 7:53

GoogleCodeExporter commented 9 years ago
The language code you specify in Rainbow can be set to upper or lower cases in 
Rainbow's UI, but when used it is normalized as lowercases. So when you specify 
"DE-DE" it's actually using "de-de" during the process.

We need to update the documentation for this.

Original comment by yves.sav...@gmail.com on 13 Jan 2015 at 11:53

GoogleCodeExporter commented 9 years ago
I had a similar issue "UTF-8" and "utf-8". Could you e.g. make the two input 
field converting erverything to lower case so that the user can see this 
behavior on the frontend? If it's just put on the documentation it might be 
overseen.

Original comment by m...@sebastianebert.com on 13 Jan 2015 at 12:57

GoogleCodeExporter commented 9 years ago
Documentation has been updated.

Original comment by yves.sav...@gmail.com on 20 Jan 2015 at 12:07

GoogleCodeExporter commented 9 years ago
OK, does this mean that one can either enter capital or lower letters on 
rainbow in the future to avoid this problem?

Original comment by m...@sebastianebert.com on 21 Jan 2015 at 7:38

GoogleCodeExporter commented 9 years ago
No, it means the documentation now includes a note pointing out the potential 
issue.

Ratel is not used to edit SRX rules that are only used with other Okapi tools. 
Other tools may or may not normalize their language codes when using SRX, so we 
cannot assume one case or the other.

The solution is to write the regular expression in a way that it is not 
case-sensitive so it always works. For example use '[Ee][Nn].*' instead of 
'en.*' or 'EN.*'.

See for example the map at the end the sample SRX that comes with the 
specification: 
http://www.gala-global.org/oscarStandards/srx/srx20.html#AppSample

Original comment by yves.sav...@gmail.com on 21 Jan 2015 at 12:09