boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
734 stars 156 forks source link

PDF blank when parse #166

Closed btrd closed 4 years ago

btrd commented 4 years ago

First of all thanks for your gem, super useful.

I have a strange bug, when I parse a document then save it, the content is only blank pages. I have the right count of pages but no content.

content = File.open('original.pdf').read
parse = CombinePDF.parse(content).to_pdf.force_encoding('UTF-8')
File.new('test.pdf', "w").write(parse)

Can I send you the document via email ?

Thanks !

boazsegev commented 4 years ago

@btrd You're welcome :)

Please send me the PDF by email and I'll see what I can do.

P.S.

Make sure to use binary read / write operations with PDF files.

You could use the #save method which does the same thing using IO#binwrite:

content = File.open('original.pdf').read
content.save('test.pdf')
cschilbe commented 4 years ago

I am also experiencing this issue. I have noticed that the CombinePDF.parse step is ending up blank even though I can see there is correct data in the incoming PDF.

The resulting combined PDF then contains blank pages.

I know that some information is parsed as the combined pages are of the same size as the parsed pages.

Any work-around would be greatly appreciated

boazsegev commented 4 years ago

@cschilbe ,

Thank you for joining the conversation.

If you could post or email me a PDF that allows me to recreate the issue, it might help.

However, please note: Just because a PDF viewer will display a PDF file does not mean the PDF file isn't corrupted.

PDF viewers auto-correct many PDF issues, sometimes using a "best guess" approach. CombinePDF does not attempt to correct anything that isn't a straightforward issue (mostly for performance reasons, but also under the assumption that the PDF files fed to CombinePDF can be fixed before being used with the gem).

Try this: open the PDF in a PDF viewer and export / print the PDF as a new PDF file. Try using the resulting (new) PDF file with CombinePDF (in the same code where you had issues).

If the issue doesn't persist, it's likely that the original PDF was malformed in some way.

Thanks again for joining the conversation.

Kindly, Bo.

boazsegev commented 4 years ago

Hi @cschilbe ,

Thank you for the email πŸ™πŸ»πŸ™πŸ»πŸ™πŸ»

The extra information helped me track down the issue and squash a bug πŸ‘πŸ»πŸ‘πŸ»πŸ‘πŸ»

It seems that the example PDF you provided contained nested data that was falsely marked as a duplicate. This was a CombinePDF bug that was hidden away by the fact that most PDF authoring tools avoid deeper nesting structures.

I tried to patch the issue with CombinePDF 1.0.17 - can you test that this works for you?

Kindly, Bo.

cschilbe commented 4 years ago

That’s awesome! I will test early tomorrow!

On Thu, Jul 16, 2020 at 6:25 PM Bo notifications@github.com wrote:

Hi @cschilbe https://github.com/cschilbe ,

Thank you for the email πŸ™πŸ»πŸ™πŸ»πŸ™πŸ»

The extra information helped me track down the issue and squash a bug πŸ‘πŸ»πŸ‘πŸ»πŸ‘πŸ»

It seems that the example PDF you provided contained nested data that was falsely marked as a duplicate. This was a CombinePDF bug that was hidden away by the fact that most PDF authoring tools avoid deeper nesting structures.

I tried to patch the issue with CombinePDF 1.0.17 - can you test that this works for you?

Kindly, Bo.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/boazsegev/combine_pdf/issues/166#issuecomment-659752398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADWRNIQO6X574E6RZ7JR7LR36LAXANCNFSM4JJWJBCQ .

cschilbe commented 4 years ago

Bo - the latest version works! Thanks for the quick resolution!

On Thu, Jul 16, 2020 at 6:30 PM Conrad Schilbe conrad.schilbe@gmail.com wrote:

That’s awesome! I will test early tomorrow!

On Thu, Jul 16, 2020 at 6:25 PM Bo notifications@github.com wrote:

Hi @cschilbe https://github.com/cschilbe ,

Thank you for the email πŸ™πŸ»πŸ™πŸ»πŸ™πŸ»

The extra information helped me track down the issue and squash a bug πŸ‘πŸ»πŸ‘πŸ»πŸ‘πŸ»

It seems that the example PDF you provided contained nested data that was falsely marked as a duplicate. This was a CombinePDF bug that was hidden away by the fact that most PDF authoring tools avoid deeper nesting structures.

I tried to patch the issue with CombinePDF 1.0.17 - can you test that this works for you?

Kindly, Bo.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/boazsegev/combine_pdf/issues/166#issuecomment-659752398, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADWRNIQO6X574E6RZ7JR7LR36LAXANCNFSM4JJWJBCQ .

boazsegev commented 4 years ago

@cschilbe ,

Thank you again for joining the conversation and providing your input. Without your extra information I might not have been able to solve this one. πŸ™πŸ»πŸ™πŸ»πŸ™πŸ»

Kindly, Bo.

btrd commented 4 years ago

Thanks, both of you! Sorry I completely forgot this issue.