StimVinsh / xdocreport

Automatically exported from code.google.com/p/xdocreport
0 stars 0 forks source link

Error parsing docx files from SharePoint #84

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Takes the attached document
2. Try to parse it 
3. Will throw exception :

What is the expected output? What do you see instead?
The document should get parsed , but got fatal error

[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" fr.opensagres.xdocreport.core.XDocReportException: 
org.xml.sax.SAXParseException: Content is not allowed in prolog.
        at fr.opensagres.xdocreport.document.preprocessor.sax.SAXXDocPreprocessor.preprocess(SAXXDocPreprocessor.java:77)
        at fr.opensagres.xdocreport.document.preprocessor.sax.SAXXDocPreprocessor.preprocess(SAXXDocPreprocessor.java:47)
        at fr.opensagres.xdocreport.document.preprocessor.AbstractXDocPreprocessor.preprocess(AbstractXDocPreprocessor.java:88)
        at fr.opensagres.xdocreport.document.AbstractXDocReport.doPreprocessorIfNeeded(AbstractXDocReport.java:353)
        at fr.opensagres.xdocreport.document.AbstractXDocReport.convert(AbstractXDocReport.java:656)

What version of the product are you using? On what operating system?
xdocreport-0.9.6  on Linux

Please provide any additional information below.
The document is stored in Microsoft SharePoint , which also come with a web 
based editor.  
My guess is that  the online Editor has added some wacky content, that is 
unexpected in the parser

Original issue reported on code.google.com by mingkem...@yahoo.com on 2 Mar 2012 at 4:51

Attachments:

GoogleCodeExporter commented 8 years ago
Hi,

I have tried your docx and I have the same problem. Problem comes from with you 
XML entries encoding. I have tried to fix your problem by using InputStream for 
XML entry and not Reader. It works with your docx.

I have created a JUNit at 
http://code.google.com/p/xdocreport/source/browse/document/fr.opensagres.xdocrep
ort.document/src/test/java/fr/opensagres/xdocreport/document/preprocessor/sax/SA
XXDocPreprocessorTestCase.java (see test testSpecialEncoding()
) with your encoding problem and JUnit works.

Regards Angelo

Original comment by angelo.z...@gmail.com on 2 Mar 2012 at 9:59

GoogleCodeExporter commented 8 years ago

Original comment by angelo.z...@gmail.com on 2 Mar 2012 at 9:59

GoogleCodeExporter commented 8 years ago
Hi,

Coudl you tell me if the fix is OK?

Thanks.

Regards Angelo

Original comment by angelo.z...@gmail.com on 13 Mar 2012 at 2:42

GoogleCodeExporter commented 8 years ago

Original comment by angelo.z...@gmail.com on 27 Mar 2012 at 9:40