Invalid XML character -> can not open generated word doc

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Data comes from DB, and is inserted using forms with textarea's (and others)
2. File is generated, send via browser to user
3. User cannot open file due to "invalid xml character" with Microsoft Word 
(2010+) (can open with openoffice though)

What is the expected output? What do you see instead?
I want to open the file, but I can't due to "invalid xml character"

What version of the product are you using? On what operating system?
1.0.0 (with velocity), grails application (2.2.2 which handles all data), 
windows 7 and Microsoft word 2010

Please provide any additional information below.
I'm trying to create dynamic word documents using this nice tool. 
But when I'm generating files where the data was inserted using a textarea, It 
can happen that the file could not be opened due to "illegal xml character". Is 
there a tool I can use to avoid handle this?

Original issue reported on code.google.com by pieterwi...@gmail.com on 15 May 2013 at 8:46

GoogleCodeExporter commented 8 years ago

It's really difficult to help you without information. Perhaps your problem 
comes from with your data that you try to merge it with the docx? I suggest you 
to create a simple Jav amain with your template docx and data to see if there 
is problem.

You can attach your template docx + generated docx which causes the problem in 
this issue and when you will have finished your Java main, attach it (with Pojo 
too).

Regards Angelo

Original comment by angelo.z...@gmail.com on 15 May 2013 at 8:53

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

I made a small demo (it's grails/groovy but:)
static byte[] testFailure(def allObjects) {
File file = new File("failure.docx")
InputStream inputStream = new FileInputStream(file)  
IXDocReport report = 
XDocReportRegistry.getRegistry().loadReport(inputStream,TemplateEngineKind.Veloc
ity)

// 2) Create Java model context
IContext context = report.createContext();
def scopes = []

//only need scope -> textarea input
for (DBObject db : allObjects) {
    HoldData newHolData = new HoldData()
    newHolData.scope = db.scope
    scopes.add(newHolData)
}

context.put("list", scopes)

// 3) Generate report by merging Java model with the ODT
String fileName = "testthefailure"
File toDeleteFile = new File(fileName+".docx")//File.createTempFile(fileName, 
".docx")
OutputStream out = new FileOutputStream(toDeleteFile);
report.process(context, out);
byte[] theFileAsBytes = toDeleteFile.readBytes()
//delete file again
toDeleteFile.delete()
return theFileAsBytes
}

This is called from a controller:
def failureTest() {
        def objects= DBObjects.list()
        byte[] fileAsBytes = GenerateCVUtil.testFailure(objects)
        response.setHeader("Content-disposition", "attachment; filename=demo.docx")
        response.setContentType("application/vnd.openxmlformats-officedocument.wordprocessingml.document")
        response.outputStream << fileAsBytes
    }

I'll atatch the template.

Original comment by pieterwi...@gmail.com on 15 May 2013 at 9:30

GoogleCodeExporter commented 8 years ago

Original comment by pieterwi...@gmail.com on 15 May 2013 at 9:31

Attachments:

failure.docx

GoogleCodeExporter commented 8 years ago

Example of the generated file.
Word cannot open it, open office can BUT there is text missing. End should be:

The Key Performance Indicators for the Delphi program are:
 Increase customer and contract data quality by 25%
 Increase efficiency of master data input by 300 to 400%
 ...

Original comment by pieterwi...@gmail.com on 15 May 2013 at 9:45

Attachments:

[demo (2).docx](https://storage.googleapis.com/google-code-attachments/xdocreport/issue-258/comment-4/demo %282%29.docx)

GoogleCodeExporter commented 8 years ago

As I said you, I think your problem is because of your data. If you unzip the 
demo(2).docx and you edit the word/document.xml, you will see that you have BEL 
character (ex : before "Increase customer and contract data quality by 25%").

So you must remove those characters when you put it in the context.

Regards Angelo

Original comment by angelo.z...@gmail.com on 15 May 2013 at 10:05

GoogleCodeExporter commented 8 years ago

Do you know if there is a method somewhere that I can call to handle this in 
Velocity?

Original comment by pieterwi...@gmail.com on 15 May 2013 at 12:50

GoogleCodeExporter commented 8 years ago

I think you can do that with EventCartridge.

1) create your escape class MyEscape like 
http://www.docjar.com/html/api/org/apache/velocity/app/event/implement/EscapeHtm
lReference.java.html and your escape method is implmented to remove special 
characters.

To use MyEscape do like this :

----------------------------------------------------------
IContext context = report.createContext(); 

EventCartridge eventCartridge = new EventCartridge(); 
eventCartridge.addEventHandler(new MyEscape()); 
eventCartridge.attachToContext((VelocityContext) context); 
----------------------------------------------------------

It's just an idea, but if you find a better mean, don't hesitate to share it.

Original comment by angelo.z...@gmail.com on 15 May 2013 at 1:06

GoogleCodeExporter commented 8 years ago

This seems to work, thanks !
Removing all ascii '7'.

Original comment by pieterwi...@gmail.com on 15 May 2013 at 2:14

GoogleCodeExporter commented 8 years ago

That's cool. I suggest you to create simple Java main with your report and next 
time please unzip the generated docx to check which XML entries causes problems.

I close the bug.

Regards Angelo

Original comment by angelo.z...@gmail.com on 15 May 2013 at 2:20

Changed state: Invalid

dbarra / xdocreport

Invalid XML character -> can not open generated word doc #258