boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
734 stars 156 forks source link

Merge result might not open on Adobe Reader with error 14 #82

Closed nathansamson closed 8 years ago

nathansamson commented 8 years ago

I am opening a new issue but I think I am basically jumping into #19 again.

Because this has been closed for quite a while I decided to open a new issue.

In a particular combined PDF evince will open the file perfectly fine, but Adobe reader won't. (error 14) The first page which is not displayed in adobe reader is indeed an exact copy from an earlier page displayed.

(Kudos to you guys btw that you detect this as a duplicate because both pages are coming from a seperate original PDF file).

I am using CombinePDF 0.2.30

If needed I can send the broken file to you but since this contains personal info I would like to this via private transfer.

nathansamson commented 8 years ago

Apparantly someone unrelated probably fixed this in #81

When I was looking at this yesterday that merge request was not opened yet, what a coincidence...

boazsegev commented 8 years ago

It seems the previous fix got broken during one of the latest updates.

I'm away, but I'll get on it soon.

For now, many thanks to @gyuchang for working on a fix and a PR.

gyuchang commented 8 years ago
boazsegev commented 8 years ago

@gyuchang ,

I think I found a better fix for this, that pull the if statement to an earlier state and out of the loop.

I also switched the lookup to a binary search using a Hash map.

Please have a look at the latest commit and let me know what you think and if it solves the issue for you (I'm not experiencing the issue).

gyuchang commented 8 years ago

I used output.pdf created with following code to test this.

require 'combine_pdf'
require 'prawn'

page = CombinePDF.parse((Prawn::Document.new { text 'Hello World!'}).render)

pdf = CombinePDF.new
pdf << page << page

pdf.save('output.pdf')

With 966f97654594a04b703a899faa16a10d9982b7b7, :Pages object does have all unique /Kids

9 0 obj
<<
/Type /Pages
/Count 2
/Kids [3 0 R 4 0 R]    <--
>>
endobj

But 2nd :Page object has incorrect /Parent

4 0 obj
<<
/Type /Page
/Parent 0 0 R    <-- should be "9 0 R"
/MediaBox [0  0  612.0  792.0 ]
/Contents 6 0 R
/Resources <<
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
/Font <<
/F1.0 5 0 R
>>
>>
>>
endobj

Acrobat Reader I have (version 15.17.20053.194476) won't render 2nd page correctly.

This is due to a bug in 966f97654594a04b703a899faa16a10d9982b7b7 - duplicated :Page objects don't get :Parent set properly.

Possible fix is as below, but I am not sure whether it's a complete fix or not

diff --git a/lib/combine_pdf/pdf_protected.rb b/lib/combine_pdf/pdf_protected.rb
index b0898e7..3e14ac5 100644
--- a/lib/combine_pdf/pdf_protected.rb
+++ b/lib/combine_pdf/pdf_protected.rb
@@ -146,7 +146,7 @@ module CombinePDF

       # point old Pages pointers to new Pages object
       ## first point known pages objects - enough?
-      pages.each { |p| p[:Parent] = { referenced_object: pages_object, is_reference_only: true } }
+      page_list.each { |p| p[:Parent] = { referenced_object: pages_object, is_reference_only: true } }
       ## or should we, go over structure? (fails)
       # each_object {|obj| obj[:Parent][:referenced_object] = pages_object if obj.is_a?(Hash) && obj[:Parent].is_a?(Hash) && obj[:Parent][:referenced_object] && obj[:Parent][:referenced_object][:Type] == :Pages}
gyuchang commented 8 years ago

Another issue is that this (latest commint, 966f97654594a04b703a899faa16a10d9982b7b7) doesn't seem to solve the duplicated objects problem.

The file size of PDF output file from my app went back up to 770kb.

I.e.

Commit used in the app File size Note
0fa0c50d8c56cc675b4249662fc92f17deb0ee90 739,366 current release: 0.2.30
00657b1d53695632bf227555ca50c4d1453d7842 770,260 first commit of PR #81, has duplication bug
8b26f8dbd13ca125a36a7fb1519e9947cdf830df 739,366 Head of PR #81
966f97654594a04b703a899faa16a10d9982b7b7 770,260 current master
boazsegev commented 8 years ago

@gyuchang ,

I had more time to look this over, so I'm pretty sure the issue is now fixed... but I broke issue #65 and Radio button data is lost... Hence, the fix isn't final.

Could you review this on your end and let me know what you discover?

boazsegev commented 8 years ago

P.S.

I'm thankful for your very complete review.

I didn't incorporate your fix since I preferred to iterate over the collection only once (rather then twice).

I hope this fix is good enough. I credited you in the CHANGELOG rather then accept the PR. I hope that's fine.

gyuchang commented 8 years ago

I tested 0.2.31 with my app and it works!

The error(14) is gone AND the file size is smaller than 0.2.30! (738,332 bytes vs 739,366 bytes)

And nice optimization here;

resolved[obj.object_id] = referenced

instead of

resolved[obj.object_id] = obj

Thus removing an indirection!

boazsegev commented 8 years ago

I'm very happy we solved this one. Compatibility is super important for me and on the way we put is some wonderful optimizations 👍