Closed nathansamson closed 8 years ago
Apparantly someone unrelated probably fixed this in #81
When I was looking at this yesterday that merge request was not opened yet, what a coincidence...
It seems the previous fix got broken during one of the latest updates.
I'm away, but I'll get on it soon.
For now, many thanks to @gyuchang for working on a fix and a PR.
add_referenced
a bit. I didn't realized it until @boazsegev expressed a concern about the performance and I wrote a crude benchmark. See my comment on #81.@gyuchang ,
I think I found a better fix for this, that pull the if
statement to an earlier state and out of the loop.
I also switched the lookup to a binary search using a Hash map.
Please have a look at the latest commit and let me know what you think and if it solves the issue for you (I'm not experiencing the issue).
I used output.pdf
created with following code to test this.
require 'combine_pdf'
require 'prawn'
page = CombinePDF.parse((Prawn::Document.new { text 'Hello World!'}).render)
pdf = CombinePDF.new
pdf << page << page
pdf.save('output.pdf')
With 966f97654594a04b703a899faa16a10d9982b7b7, :Pages
object does have all unique /Kids
9 0 obj
<<
/Type /Pages
/Count 2
/Kids [3 0 R 4 0 R] <--
>>
endobj
But 2nd :Page
object has incorrect /Parent
4 0 obj
<<
/Type /Page
/Parent 0 0 R <-- should be "9 0 R"
/MediaBox [0 0 612.0 792.0 ]
/Contents 6 0 R
/Resources <<
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
/Font <<
/F1.0 5 0 R
>>
>>
>>
endobj
Acrobat Reader I have (version 15.17.20053.194476) won't render 2nd page correctly.
This is due to a bug in 966f97654594a04b703a899faa16a10d9982b7b7 - duplicated :Page
objects don't get :Parent
set properly.
Possible fix is as below, but I am not sure whether it's a complete fix or not
diff --git a/lib/combine_pdf/pdf_protected.rb b/lib/combine_pdf/pdf_protected.rb
index b0898e7..3e14ac5 100644
--- a/lib/combine_pdf/pdf_protected.rb
+++ b/lib/combine_pdf/pdf_protected.rb
@@ -146,7 +146,7 @@ module CombinePDF
# point old Pages pointers to new Pages object
## first point known pages objects - enough?
- pages.each { |p| p[:Parent] = { referenced_object: pages_object, is_reference_only: true } }
+ page_list.each { |p| p[:Parent] = { referenced_object: pages_object, is_reference_only: true } }
## or should we, go over structure? (fails)
# each_object {|obj| obj[:Parent][:referenced_object] = pages_object if obj.is_a?(Hash) && obj[:Parent].is_a?(Hash) && obj[:Parent][:referenced_object] && obj[:Parent][:referenced_object][:Type] == :Pages}
Another issue is that this (latest commint, 966f97654594a04b703a899faa16a10d9982b7b7) doesn't seem to solve the duplicated objects problem.
The file size of PDF output file from my app went back up to 770kb.
I.e.
Commit used in the app | File size | Note |
---|---|---|
0fa0c50d8c56cc675b4249662fc92f17deb0ee90 | 739,366 | current release: 0.2.30 |
00657b1d53695632bf227555ca50c4d1453d7842 | 770,260 | first commit of PR #81, has duplication bug |
8b26f8dbd13ca125a36a7fb1519e9947cdf830df | 739,366 | Head of PR #81 |
966f97654594a04b703a899faa16a10d9982b7b7 | 770,260 | current master |
@gyuchang ,
I had more time to look this over, so I'm pretty sure the issue is now fixed... but I broke issue #65 and Radio button data is lost... Hence, the fix isn't final.
Could you review this on your end and let me know what you discover?
P.S.
I'm thankful for your very complete review.
I didn't incorporate your fix since I preferred to iterate over the collection only once (rather then twice).
I hope this fix is good enough. I credited you in the CHANGELOG rather then accept the PR. I hope that's fine.
I tested 0.2.31 with my app and it works!
The error(14) is gone AND the file size is smaller than 0.2.30! (738,332 bytes vs 739,366 bytes)
And nice optimization here;
resolved[obj.object_id] = referenced
instead of
resolved[obj.object_id] = obj
Thus removing an indirection!
I'm very happy we solved this one. Compatibility is super important for me and on the way we put is some wonderful optimizations 👍
I am opening a new issue but I think I am basically jumping into #19 again.
Because this has been closed for quite a while I decided to open a new issue.
In a particular combined PDF evince will open the file perfectly fine, but Adobe reader won't. (error 14) The first page which is not displayed in adobe reader is indeed an exact copy from an earlier page displayed.
(Kudos to you guys btw that you detect this as a duplicate because both pages are coming from a seperate original PDF file).
I am using CombinePDF 0.2.30
If needed I can send the broken file to you but since this contains personal info I would like to this via private transfer.