boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
735 stars 157 forks source link

Images duplicating in v1.0.0 #109

Closed Subtletree closed 7 years ago

Subtletree commented 7 years ago

I'm using prawn to generate a pdf with 2 separate but similar images.

Since combine_pdf v1.0.0, when this pdf is parsed (with combine_pdf) the second image overwrites the first image so the resulting pdf has the same image twice.

This behaviour does not occur in v0.2.37.

Let me know if you want a copy of the pdf.

boazsegev commented 7 years ago

Hi @Subtletree ,

Thanks for opening this issue. It sounds like you're onto a real issue.

I would love a copy of the PDF to review and isolate the issue.

I doubt if upgrading to CombinePDF 1.0.1 will solve the issue (it was a very small fix).

Thanks! B.

P.S.

I should probably point out that CombinePDF offers optimizations for "equal" object streams (font data etc') but it shouldn't effect "almost equal" objects... I'll have to take a look.

Subtletree commented 7 years ago

@boazsegev Thanks for taking a look!

pdf generation using prawn:

pdf_data = ApplicationController.new.render_to_string("/signature.pdf", locals: { model: self })
temp_pdf = StringIO.open(pdf_data)

Creates this file: original.pdf

If i parse with combine_pdf

pdf_data = ApplicationController.new.render_to_string("/signature.pdf", locals: { model: self })
signature_pdf = CombinePDF.parse(pdf_data)
temp_pdf = StringIO.open(signature_pdf.to_pdf)

I get this where the second signature overwrites the first: duplicated.pdf

This is using CombinePDF 1.0.1. The problem first appears in 1.0.0 though.

boazsegev commented 7 years ago

@Subtletree ,

Thanks for exposing this hard to catch error! β˜ΊοΈπŸ™πŸ»πŸ‘πŸ»

I released a patch with a fix.

This issue was caused because of a fix I implemented to a different issue, where complex PDF files would raise a stack overflow exception or simply crash Ruby.

This was caused by the hash and eql? core Ruby implementation and I submitted an issue to the Ruby core about it, but I ended up writing a poor workaround.

The PDF file you sent me had nested differences between the images. These images were only seen on the second "layer" of the PDF objects (in objects referenced by the images)... since I was using a single layer equality test, CombinePDF didn't catch the differences and (mistakenly) merged the images.

I fixed it by using a limited recursive design that checks up to 3 layers (which is 1 more than actually needed).

Now it should work as expected.

Thanks again! B.

Subtletree commented 7 years ago

Tested and working, thanks @boazsegev !! πŸ’―