Closed GoogleCodeExporter closed 9 years ago
Original comment by yves.sav...@gmail.com
on 15 May 2013 at 3:34
Original comment by tingley
on 15 May 2013 at 8:47
This is almost certainly something I've done wrong.
Original comment by tingley
on 15 May 2013 at 9:15
Is the fix available in the dev build now ? can I test it ?
Original comment by 143.ravi...@gmail.com
on 28 May 2013 at 4:28
Sorry, no, it's not fixed yet.
Original comment by tingley
on 28 May 2013 at 6:55
Hi ravikant,
Thanks for your patience. I finally had a chance to look at this and... I'm
afraid I may need more information from you. I'm not able to reproduce this
problem using a basic roundtrip test. Here's what I did:
* Copied your source file to a file called cdataWithGroup.xml (attached)
* Created a filter config with your rules, okf_xmlstream@cdata.fprm (attached)
Then I ran two tikal commands to convert the source XML to XLIFF, and then back
to XML:
tikal.sh -fc okf_xmlstream\@cdata.fprm -x cdataWithGroup.xml
tikal.sh -fc okf_xmlstream\@cdata.fprm -m cdataWithGroup.xml.xlf
This produces an output file (cdataWithGroup.out.xml) which I would expect to
demonstrate the problem, if it were just a matter of the filter misbehaving.
However, the output file looks fine to me.
So it seems that there's another factor involved which I will need to take into
account in order to reproduce this. Can you provide any more details about
what you were doing to the source file after it had been segmented? (ie, how
was it translated?)
Thanks
Original comment by tingley
on 5 Jun 2013 at 5:20
Attachments:
Hi Tingley,
At my end with the M22 snap shot version of the "okapi-filter-abstractmarkup"
jar the original issue of an spurious segment getting generated is fixed. It
used t generate 1 for each CDATA tag.
Source file b.xml attached.
Now I do not see that getting generated anymore and the XLIFF output is also as
expected.(de-DE.xlf).
While generating the XML file back there is a pipeline used which takes the
original .xml file as the RawDocument and adds the following steps -
1. RawDocumentToFilterEventsStep()
2.driver.setFilterConfigurationMapper();
3. TranslateStep()
4. FilterEventsStreamWriterStep().
The translate step just updates each text unit targets with the appropriate
localized strings
This output of this pipe line is the xml back where I see the tags getting
misplaced.
Also I see 1 more difference in terms of the rules which u have set in the
attached okf_xmlstream@cdata.fprm -
I have used the "element" -
global_cdata_subfilter: okf_html
preserve_whitespace: false
elements:
solutions:
ruleTypes: [INCLUDE]
resolution:
ruleTypes: [GROUP]
description:
ruleTypes: [GROUP]
but you seem to have used the "attributes"
global_cdata_subfilter: okf_html
preserve_whitespace: false
attributes:
resolution:
ruleTypes: [GROUP]
description:
ruleTypes: [GROUP]
Not sure if this too could be the difference in the output which we both are
seeing.
Original comment by 143.ravi...@gmail.com
on 6 Jun 2013 at 3:16
Attachments:
Hi Tingley,
Did my comments help ? Were you able to reproduce at your side ?
Thanks
Original comment by 143.ravi...@gmail.com
on 10 Jun 2013 at 2:14
Hi both,
Just to confirm I'm getting the same output Ravi is getting with his
configuration.
I have to admin I'm not sure about when to use GROUP and when to use TEXTUNIT
though.
If using TEXTUNIT it merges back ok but it creates the extraneous empty xliff
TextUnits.
Fredrik
Original comment by KFLi...@gmail.com
on 10 Jun 2013 at 5:38
Hi ravikant,
Yes, you're right, I had a mistake in my YML configuration. Thanks for
pointing that out. I'm able to reproduce the problem now.
Fredrik: I agree, the semantics of several of the tag rules (including
TEXTUNIT) are not very clear.
I assume that GROUP is intended to produce START_GROUP/END_GROUP events, which
are used for example to produce <group> elements in XLIFF. Looking at the
XLIFF output from tikal, it looks like this issue may be related to the fact
that subfiltering also always produces a group. For example:
<group id="sg1">
<group id="sg1_ssf1" resname="sub-filter:sd1">
<trans-unit id="sg1_tu1" resname="sd1_1" restype="x-paragraph">
<source xml:lang="en"></source>
<target xml:lang="fr"></target>
</trans-unit>
<trans-unit id="sg1_tu2" resname="sd1_2" restype="x-li">
<source xml:lang="en">Test</source>
<target xml:lang="fr">Test</target>
</trans-unit>
</group>
</group>
Note the nested <group> elements. XLIFF allows nested <group>, although it's
not commonly used in my experience. I wonder if this is confusing our merger.
I'll step through this.
Sorry for the slow progress, I've had almost no free time in the past few weeks.
Original comment by tingley
on 14 Jun 2013 at 7:09
This is just state confusion during the event generation. The reference
subfilter content isn't being correctly included in the skeleton for either of
the group events. Instead it gets left for the DOCUMENT_PART event that
follows. This moves the CDATA section outside of its parent element on
reassembly.
Original comment by tingley
on 14 Jun 2013 at 7:32
I have checked in a fix and unittest to dev. Commit is here:
https://code.google.com/p/okapi/source/detail?r=efa2b0935952a278304c4f9461ced664
e4d10b36&name=dev
ravikant, the next snapshot build should include the fix.
Original comment by tingley
on 14 Jun 2013 at 9:16
Thanks a lot Tingley for looking into this.
Original comment by 143.ravi...@gmail.com
on 19 Jun 2013 at 7:18
Original issue reported on code.google.com by
143.ravi...@gmail.com
on 15 May 2013 at 3:22