computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

Various SDLXLIFF issues #367

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
From:
http://groups.yahoo.com/neo/groups/OmegaT/conversations/messages/29838

> However, after making a single change to a file,
> and then adding the BOM to the output file again,
> Trados still does not accept the file.
> Trados' error messages are even less helpful, 
> so I can't tell what the problem is:
> http://i44.tinypic.com/10croer.png

That looks like a similar issue as Roman reported.
We'll try to reproduce and debug it.

> version="1.2" sdl:version="1.0" 
> xmlns="urn:oasis:names:tc:xliff:document:1.2">
> into this:
> xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2" 
> sdl:version="1.0">
> ...which shouldn't make a difference, but we all know 
> that "shouldn't" doesn't always work that way.
> I've seen otherwise perfectly good programs choke on 
> the fact that a tag's attributes were not in the 
> sequence that the program expected them to be.

You are correct: the order of the attributes does not matter in XML.
There is actually often no way to know what the original order was once it is 
parsed.
Tools expecting a fixed order are not using XML parsers and then there is not 
much we can do about that.

> 2. Also in the header, Okapi changes utf-8 to UTF-8.

Same here: the official IANA name is uppercase, and XML processor should be 
case-insensitive for encoding declaration (See 
http://www.w3.org/TR/REC-xml/#charencoding).

> 3. Trados closes standalone tags, whereas Okapi pairs them.
> For example, Trados would write <foo/> but Okapi would write <foo></foo>.

Both notations are equivalent. But we could try to use the shorthand when 
writing back.

> 4. Trados entitises both < and >, whereas Okapi uses 
> entities only for <.  I understand that from a puristic point 
> of view, > does not need to be written as an entity, 
> but I also know that some parsers don't hold that view,
> and dislike it when you do that.

I would say the tools using XML parsers will have no problem with this.
Here again: there is no way to know what was the original form once the 
character is parsed.
We could force all > to be escaped, but then the users who want them un-escaped 
because they want to do compare with the original file where they are 
un-escaped would complain... there is no way we can win.

> 5. In thd BODY, Okapi removes the "trans-unit" tags 
> around any group of tags that don't have meaningful 
> meaning in human language.
> http://i43.tinypic.com/6t20so.png

That is definitely an issue.
Possibly the cause for your merging error.

Original issue reported on code.google.com by yves.sav...@gmail.com on 1 Oct 2013 at 10:25