computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

Parsing CSV files with embeded double-quotes and commas #404

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a simpe CSV file (example attached)
2. Using tikal -x create a XLIFF file
3. The simple string within the quotes will be separated between several 
translation units

What is the expected output? What do you see instead?
CSV cell "Item 2, with \"quotes\"" should be exported as a single translation 
unit:

<trans-unit id="4" label="item1">
<source xml:lang="en">Item 2, with "quotes"</source>
<seg-source><mrk mid="0" mtype="seg">Item 2, with "quotes"</mrk></seg-source>
<target xml:lang="de"><mrk mid="0" mtype="seg">Item 2, with 
"quotes"</mrk></target>
</trans-unit>

But instead it generates a whole group:

<group id="4" restype="row">
<trans-unit id="4" label="item2">
<source xml:lang="en"><x id="1"/>Item 2</source>
<seg-source><mrk mid="0" mtype="seg"><x id="1"/>Item 2</mrk></seg-source>
<target xml:lang="de"><mrk mid="0" mtype="seg"><x id="1"/>Item 2</mrk></target>
</trans-unit>
<trans-unit id="5" label="item2">
<source xml:lang="en">with \<x id="1"/>quotes\<x id="2"/><x id="3"/></source>
<seg-source><mrk mid="0" mtype="seg">with \<x id="1"/>quotes\<x id="2"/><x 
id="3"/></mrk></seg-source>
<target xml:lang="de"><mrk mid="0" mtype="seg">with \<x id="1"/>quotes\<x 
id="2"/><x id="3"/></mrk></target>
</trans-unit>
</group>

What version of the product are you using? On what operating system?
This happens for the latest version (24) both on Mac OSX 10.9 and on linux 
(Linux version 3.5.0-48-generic (buildd@batsu) (gcc version 4.6.3 
(Ubuntu/Linaro 4.6.3-1ubuntu5) ))

Original issue reported on code.google.com by 1kot...@gmail.com on 14 May 2014 at 12:08

Attachments:

GoogleCodeExporter commented 9 years ago
Looking at the Table filter page:

http://www.opentag.com/okapi/wiki/index.php?title=Table_Filter

The CSV Escaping Mode is "Duplicate qualifier" by default:  
"Item 2, with ""quotes"""

To use backslash as the escaping mechanism you need to change the parameter. 
For example the attached config file (created in Rainbow).

Pass that to Tikal during extraction 

-fc okf_table@csv_backslash

Original comment by fli...@enlaso.com on 16 May 2014 at 5:30

Attachments: