computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

XLIFF: Set xml:space="preserve" for entries created using ITS rule with itsx:whiteSpaces="preserve" #311

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When configuring custom ITS with entries using itsx:whiteSpaces="preserve", the 
extracted entries in XLIFF do have the spaces preserved, but don't have 
xml:space="preserve" set, so processing them in tools that do not treat XLIFF 
as xml:space="preserve" by default (fe. OmegaT) results in wrong translations. 
If the entry is created by a ITS rule with itsx:whiteSpaces="preserve" it 
should have xml:space="preserve" set.

Original issue reported on code.google.com by khagar...@gmail.com on 3 Feb 2013 at 8:42

GoogleCodeExporter commented 9 years ago
It seems to be working for me.

For example, if I process:

<?xml version="1.0" ?>
<doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0">
 <prolog>
  <date>2013-02-13</date>
  <its:rules version="2.0" xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
   <its:translateRule selector="/doc/prolog" translate="no"/>
   <its:idValueRule selector="//para" idValue="@id"/>
   <its:withinTextRule selector="//b" withinText="yes"/>
   <its:translateRule selector="//literal" translate="yes" itsx:whiteSpaces="preserve"/>
  </its:rules>
 </prolog>
 <body>
  <para id="p1">Rome is the capital city of Italy.</para>
  <para id="p2">It is also the country's largest and most populated comune and fourth-most populous city in the European Union by population within city limits.</para>
  <literal xml:space='preserve'>Country:    Italy
Population: 2,777,979 (2011)
Time zone:  CET</literal>
 </body>
</doc>

I get:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" 
xmlns:okp="okapi-framework:xliff-extensions" 
xmlns:its="http://www.w3.org/2005/11/its">
<file original="/Example_XML.xml" source-language="en-us" 
target-language="fr-fr" datatype="xml">
<body>
<trans-unit id="1" resname="p1">
<source xml:lang="en-us">Rome is the capital city of <g 
id="1">Italy</g>.</source>
</trans-unit>
<trans-unit id="2" resname="p2">
<source xml:lang="en-us">It is also the country's largest and most populated 
comune and fourth-most populous city in the European Union by population within 
city limits.</source>
</trans-unit>
<trans-unit id="3" xml:space="preserve">
<source xml:lang="en-us">Country:    Italy
Population: 2,777,979 (2011)
Time zone:  CET</source>
</trans-unit>
</body>
</file>
</xliff>

As you can see the xml:space is set on the trans-unit. xml:space is inherited 
by all children elements (http://www.w3.org/TR/xml/#sec-white-space).

Do you have an example where it's not working?
Thanks.
-yves

Original comment by yves.sav...@gmail.com on 3 Feb 2013 at 12:07

GoogleCodeExporter commented 9 years ago
Try source XML without xml:space="preserve".

Original comment by khagar...@gmail.com on 3 Feb 2013 at 3:02

GoogleCodeExporter commented 9 years ago
Sorry: Ive tried several cases and mis-copied the example in my previous post.

I did try without xml:preserve='preserve' in <literal>:

<?xml version="1.0" ?>
<doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0">
 <prolog>
  <date>2013-02-13</date>
  <its:rules version="2.0" xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
   <its:translateRule selector="/doc/prolog" translate="no"/>
   <its:idValueRule selector="//para" idValue="@id"/>
   <its:withinTextRule selector="//b" withinText="yes"/>
   <its:translateRule selector="//literal" translate="yes" itsx:whiteSpaces="preserve"/>
  </its:rules>
 </prolog>
 <body>
  <para id="p1">Rome is the capital city of Italy.</para>
  <para id="p2">It is also the country's largest and most populated comune and fourth-most populous city in the European Union by population within city limits.</para>
  <literal>Country:    Italy
Population: 2,777,979 (2011)
Time zone:  CET</literal>
 </body>
</doc>

and got the exact same result:
<trans-unit id="3" xml:space="preserve">

Original comment by yves.sav...@gmail.com on 3 Feb 2013 at 3:12

GoogleCodeExporter commented 9 years ago
Then I wonder why it doesn't work for me, perhaps it's because the files I'm 
translating do have the translatable text stored as attributes.

Original comment by khagar...@gmail.com on 3 Feb 2013 at 4:39

GoogleCodeExporter commented 9 years ago
The property should affect the translated attributes as well.
If you send me an example that reproduce the problem for you I can try to debug 
it and fix it.
-ys

Original comment by yves.sav...@gmail.com on 3 Feb 2013 at 5:38

GoogleCodeExporter commented 9 years ago
Here is an example file and the ITS rule. I'm creating an OmegaT project using 
the translation kit creation.

Original comment by khagar...@gmail.com on 3 Feb 2013 at 10:45

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks, I can reproduce the issue.
As you suggested, it looks like a difference in the way we process the 
extracted text when it comes from an attribute.
I'll work on it.

Original comment by yves.sav...@gmail.com on 4 Feb 2013 at 12:52

GoogleCodeExporter commented 9 years ago
The issue should be fixed now.
The fix is available in the latest manual snapshots
(http://okapi.opentag.com/snapshots/)

Thanks for pointing out the problem.
-yves

Original comment by yves.sav...@gmail.com on 4 Feb 2013 at 4:03

GoogleCodeExporter commented 9 years ago

Original comment by yves.sav...@gmail.com on 4 Feb 2013 at 4:03