computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

XLIFF filter: Okapi custom extensions mask LQI issue type data during parsing #424

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This was discovered as part of [https://ocelot.atlassian.net/browse/OC-48 
OC-48] in Ocelot.  

I've attached an XLIFF file that initially contained no ITS metadata.  It was 
opened in Ocelot and a single LQI was added to the lone segment, of type 
"mistranslation".  However, when this document is parsed by the XLIFF filter, 
the "mistranslation" issue is reported as being of LQI type "uncategorized".

The problem has to do with the "okp:lqiType" custom extensions that Okapi 
writes out along with the ITS metadata.  The code in ITSLQI.java that builds 
annotation data from ITS markup encounters the ITS attribute first, and calls 
IssueAnnotation.setITSType().  It then encounters the okp:lqiType annotation, 
and calls IssueAnnotation.setIssueType(), which includes a nested call to 
setITSType() that overwrites the value that is already set.

I think the better behavior is for setIssueType() to leave existing type values 
intact, but I'm not sure, as I don't really understand the purpose of the 
extensions at the moment.

This is unfortunately looks like a pretty bad bug for Ocelot, as it breaks the 
flag display of issues for files that have previously been touched by the tool.

Original issue reported on code.google.com by tingley on 29 Nov 2014 at 6:16

Attachments:

GoogleCodeExporter commented 9 years ago
The extension is to keep track of the issues generated by CheckMate, which 
predate ITS. We can change ITSLQI() to keep track of the ITS type when it 
founds one and make sure it sets it again if an okp:lqiType is found.
I think that would solve at least the Ocelot issue. I've tested on a round 
trip, and the ITS type is preserved.
This should not have side-effects on CheckMate.

I can make the change.

But the ultimate solution is probably to align better the CheckMate types with 
the ITS types (http://www.w3.org/TR/its20/#lqissue-typevalues).
I suppose we could add CM type for all ITS types without a correspondance in 
this table: 
http://www.w3.org/International/its/wiki/Localization_quality_types_mappings

I've noticed two other issues when trying to open the example file on CheckMate:

a) CheckMate expects the annotation to be on <source> or/and <target>, not on 
<trans-unit>. I'm not sure of the implications of changing that. The effect now 
is that the Ocelot's LQI are not listed in CheckMate.

b) The stand-off notation saved after </file> is invalid and by default 
CheckMate checks XLIFF against the schema, so it can't open the file. We know 
of the issue (it's https://code.google.com/p/okapi/issues/detail?id=363) but 
with CheckMate doing the schema validation by default, it makes the problem a 
bit glaring. But  I'm not sure how to solve this.

Original comment by yves.sav...@gmail.com on 29 Nov 2014 at 1:52

GoogleCodeExporter commented 9 years ago
Thanks Yves.

For problem a), would it be possible to fix this problem (at least as a 
transitional effort) entirely at the code level?  LQI data lives on 
source/target/etc because that's what the mapping says; CheckMate does it 
differently, but changing that will probably cause problems with legacy files.

But with enough refactoring, we could make all the quality issues anywhere in 
the TU accessible "at the TU level" via a method call.  The LQI issues would 
still be attached to the same objects, but could also be gathered as part of 
"TU metadata".

This might not be a small change, though, since a lot of this metadata is 
hidden in the annotation system in semi-freeform.

Original comment by tingley on 29 Nov 2014 at 5:58

GoogleCodeExporter commented 9 years ago
I pushed a unittest to verify your fix (commit c8a06f9f1).

Original comment by tingley on 30 Nov 2014 at 8:39

GoogleCodeExporter commented 9 years ago
Yves, I'm happy to resolve this here and split off the other changes to new 
issues, but if you'd like to keep it open to track the other CheckMate work, 
that's fine. 

Original comment by tingley on 30 Nov 2014 at 8:40

GoogleCodeExporter commented 9 years ago

Original comment by yves.sav...@gmail.com on 30 Nov 2014 at 11:37