computerline1z / okapi

Automatically exported from code.google.com/p/okapi
0 stars 0 forks source link

DOCX/OpenXML - tag corruption in document.xml #379

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
 I'm an user of Okapi and I had a trouble while opening documents in Microsoft Office 2007: File.docx .  The error message is: /Word / document.xml line 6294 colums 6293. The problem doesn't exist in OpenOffice, there is no problems in the file.docx (I can open the file without any error message)

tikal.sh -lm file.docx -totrg -from aftertest 

Original issue reported on code.google.com by bailo...@gmail.com on 22 Nov 2013 at 10:12

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by tingley on 29 Jan 2014 at 6:29

GoogleCodeExporter commented 9 years ago
The document.xml file in the attached file.out.docx has been corrupted.

At the offset (line 2, column 6294) there is some very weird tag structure in 
which a run (<w:r>) has been embedded directly within the <w:t> of another run. 
It looks like this:

<w:p>
  <w:pPr>
    <!-- snipped for space -->
  </w:pPr>
  <w:r>
    <w:rPr>
      <w:rStyle w:val="CharAttribute0"/>
      <w:rFonts w:eastAsia="Batang"/>
      <w:sz w:val="24"/>
      <w:szCs w:val="24"/>
    </w:rPr>
    <w:t xml:space="preserve">
      <w:r>     <------------- What
        <w:rPr>
          <w:rStyle w:val="CharAttribute0"/>
          <w:rFonts w:eastAsia="Batang"/>
          <w:sz w:val="24"/>
          <w:szCs w:val="24"/>
          <w:u w:val="single"/>
        </w:rPr>
        <w:t> légende</w:t>
    </w:r>
    </w:t>  <----  what
  </w:r>
</w:p>

This isn't invalid XML, but I'm pretty sure it's illegal in OpenXML.  (I'd need 
to check.)

Also, it looks like document.xml has some tag mismatches further on (line 2, 
column 42916), based on trying to open it in an XML editor.

Original comment by tingley on 29 Jan 2014 at 6:44