ODT->PDF Conversion: Inconsistent Page Numbers

GoogleCodeExporter commented 9 years ago

>>> What steps will reproduce the problem?
1. Convert the attached "page_numbers.odt" file to PDF using the PdfConverter 
XDocReport class.

>>> What is the expected output? What do you see instead?
The expected output should be a PDF with 2 pages, with the correct numbering in 
the footer. The PDF that is created has 3 pages, with the last page displaying 
"Page 3 of 2" in the footer.

>>> What version of the product are you using? On what operating system?
1.0.0-SNAPSHOT, latest code from the git repo (as of this morning.)
Windows 7 64-bit

>>> Please provide any additional information below.
I think the root of the issue is that the total page count is not updated when 
the PDF is created. The current page number is OK, but not the total.

Also, if my previous issue (193) is fixed, this likely won't happen. But in 
general, if the conversion is not 1:1, then the page numbering could get messed 
up.

Original issue reported on code.google.com by PhilipDN...@gmail.com on 9 Nov 2012 at 7:31

Attachments:

GoogleCodeExporter commented 9 years ago

Hi Philip,

I think to manage total page with iText is very hard. If I have understood 
http://stackoverflow.com/questions/759909/how-to-add-total-page-number-on-every-
page-with-itext it seems you must re-read the PDF that you have generated touo 
update total page. I don't know if it's possible with our (docx and odt) 
converter to PDF and I'm afraid with performance. 

@Leszek : what do you think about the total page management?
Regards Angelo

Original comment by angelo.z...@gmail.com on 9 Nov 2012 at 8:35

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

@Leszek : I have add a JUnit 
http://code.google.com/p/xdocreport/source/browse/thirdparties-extension/org.odf
toolkit.odfdom.converter.pdf/src/test/resources/org/odftoolkit/odfdom/converter/
core/Issue194.odt with the odt attached of this issue.

Original comment by angelo.z...@gmail.com on 10 Nov 2012 at 2:11

GoogleCodeExporter commented 9 years ago

Actually it is not so hard but it may require generating pdf file twice - first 
one to determine actual page count and second one to generate the final pdf.

I've did some optimization. First I process odt document as before.
If total page count did not occured in the document or page count equals actual 
page count then generated pdf is written to output stream.
If actual page count != estimated page count then the document must be 
regenerated.

Most often the second processing does not occur so performance shouldn't be a 
problem. In the whole test set only OdtBig.odt must be generated twice because 
of inconsistent page count.

Original comment by abe...@gmail.com on 13 Nov 2012 at 5:33

GoogleCodeExporter commented 9 years ago

Hi Leszek, 

Greaj work! I will study your code to try to do the same thing with docx->pdf 
converter.

The only thing I find it shame is that you are using ByteArrayInputStream to do 
that. I'm afraid with out of memory if converter is every time used (ex our 
JAX-RS converter hosted on cloudbee).

Sometimes your odt has page number, sometimes your odt has not page number. So 
I tell me it should be perhaps good to manage page number with PdfOptions too 
(keep your code but test if PdfOptions#updatePageNumber() returns true 
otherwise don't use ByteArrayInputStream (the old code).

What do you think about this idea?

Regards Angelo

Original comment by angelo.z...@gmail.com on 15 Nov 2012 at 10:17

GoogleCodeExporter commented 9 years ago

Hi Angelo

I thought some time how to keep intermediate document, finally I decided to use 
ByteArrayInputStream. If you use converter in some web application than the 
converted pdf is written to http resonse - also in memory. In my opinion using 
ByteArrayInputStream to hold intermediate document doesn't change much.

Of course you can change it
1 use temporary file instead of ByteArrayInputStream
2 add some option in PdfOption (like you suggest) to use/not use it
3 always make two passes. First pass is only to determine page count. Results 
may be outputted to Null output stream (so it does not consume memory). Second 
pass would go to the original output stream. Here we have a tradeoff between 
memory and speed

Basically we have four options
do not update page count/use memory/use temp file/make two passes

If you want to modify the mechanism - do as you wish.

Regards
Leszek

Original comment by abe...@gmail.com on 16 Nov 2012 at 7:17

GoogleCodeExporter commented 9 years ago

Original comment by abe...@gmail.com on 3 Jun 2013 at 10:47

Changed state: Fixed

Kogie / xdocreport

ODT->PDF Conversion: Inconsistent Page Numbers #194