boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
735 stars 157 forks source link

Combined PDF is completely blank #131

Closed mjohanning99 closed 6 years ago

mjohanning99 commented 7 years ago

I'm not sure whether I am doing something wrong, but when I try to combine the PDFs from https://www.perl.org/books/beginning-perl/ using combine_pdf all I get is a nearly 600-page long (which is the length of all the PDFs) and 14 MB large, blank PDF-file. The code I am using to combine the PDFs is as follows.

#!/usr/bin/ruby require 'combine_pdf' @pdf = CombinePDF.new Dir.entries(Dir.pwd).each do |file| @pdf << CombinePDF.load(file) unless File.extname(file) != ".pdf" end @pdf.save "BeginningPerl.pdf"

boazsegev commented 7 years ago

Hi @Jeytas ,

Thank you for opening this issue.

I'm quite surprised at the situation. There don't seem to be any warnings or anything, yet the resulting pages look blank.

Sadly I'm swamped and don't know when I'll get to this.

I'll keep you posted with any updates.

Kindly, Bo.

boazsegev commented 7 years ago

Quick update:

I'm still looking for the cause of the issue, but I think I might have recognized String corruption as a possible source (it seems the data streams might be different than the source and it seems the PDF reader might be unable to decompress them).

It's taking me longer to realize the cause than I hoped. However, once the cause is known, it should be a matter of tracking down the offensive code and patching it up... which is often about the same difficulty or easier, depending on the cause of the issue.

EDIT

Nope, this isn't the issue... just a testing tool glitch... back to square zero.

boazsegev commented 6 years ago

Sidenote/Update:

It's true that there are authoring errors in the PDFs that result in this issue, but: 1. some other readers recuperate nicely, can't CombinePDF?; and 2. the parser isn't complaining (if we must fail, let it be loudly).

bharat303 commented 6 years ago

I am also getting same issue. While Combining PDFs of different resolution, I am getting PDF with full content of First pdf, But I am getting Blank pages for 2nd PDF.

boazsegev commented 6 years ago

@bharat303 ,

Thank you for commenting on this issue. The error you're describing seems very interesting, maybe it could help narrow the hunt for the issue.

Could you send me sample PDFs so I can recreate the same error (where the first PDF shows and the other one does not)?

bharat303 commented 6 years ago

@boazsegev Thanks for your reply. I am attaching demo pdf which is showing empty pages.

Thanks Dummy.pdf

boazsegev commented 6 years ago

@bharat303 ,

Thanks for the demo PDF!

I managed to reproduce the issue with the supplied PDF.

I'll see if I can find the root cause.

I see it's a PDF 1.1 format, but I can't find the "producer" metadata - do you know which program created the PDF?

B.

bharat303 commented 6 years ago

@boazsegev No. I don't know because in our system, billing files are uploaded in this format and there are large number of such files. And as per I see, i realise that this pdf is some what different than others. Other files have content in binary, but this have content as plain text and I can edit it in notepad.

bharat303 commented 6 years ago

@boazsegev Have you found issue in this? :+1:

boazsegev commented 6 years ago

@bharat303 ,

Your PDF file helped me track down the issue, but it seems these might be two separate issues.

The issue you experienced was related to the PDF xref table (which was in the middle of the file rather than it's end), on a PDF version 1.1 (which an older version and this issue appears to be edge case related).

I posted a fix on the Git repository, but I haven't released it yet.

B.

bharat303 commented 6 years ago

Ok @boazsegev Many many thanks for considering my messages and solving issue. Please let me know when this will be released.

boazsegev commented 6 years ago

@bharat303

I just released the updated version with the fix.

I feel that the changes to the parser were significant and I hope this release doesn't break anything (it passes the tests, but still)...

Thanks again for sending the demo PDF. I would never have found the issue without it.

Kindly, Bo.

edsontrick commented 6 years ago

My PDF is completely Blank too

test1.pdf

boazsegev commented 6 years ago

@edsontrick , thank you for posting a test PDF.

However, I can't replicate the issue, the test PDF works on my machine.

Are you running CombinePDF's latest version (should be 1.0.15)?

Can you send me your system details and example code to test?

Kindly, Bo.

edsontrick commented 6 years ago

pdf = CombinePDF.load "test1.pdf"

respond_to do |format| format.json { send_data(pdf.to_pdf, :type => "application/pdf") } end

My version is 1.0.15. And my code is only this, because the pdf is comming blank alone.

This other PDF is generating a blank pdf too > test2.pdf

The version of the first PDF is 1.5 and the second is 1.7

boazsegev commented 6 years ago

Can this be related to the requested JSON formatting getting in the way...?

Can you test the following code in your console:

require 'combine_pdf'
pdf = CombinePDF.load "test1.pdf"; nil # prevent long output to terminal
pdf.save "tmp_pdf_test.pdf"

See if "tmp_pdf_test.pdf" is blank.

edsontrick commented 6 years ago

Thank you :) The problem is on the JSON formatting.

boazsegev commented 6 years ago

@edsontrick , You're welcome :-)

Thank you for posting your concerns. I'm happy we could solve this quickly.

Please feel free to open a new issue if something comes up.

Side Note: when you re-open a closed issue, all the previous people get notified about each message. It is more common to open a new issue instead, perhaps referencing the old one.

Kindly, Bo.

bharat303 commented 6 years ago

Hello @boazsegev ,

I have again 2 PDFs having malfunctioning issue. In One PDF, In-between 13 pages are getting Blank.(Not whole PDF). And in Second PDF, Content is getting Ambiguous. I can not share those files here as it is much of Confidential files of my client. Can we have direct chat over skype or hangout? My Skype : bharat_godhani And gmail : bharatg.30.3@gmail.com Please add me in anyone and in your free time, we can discuss for this. I will share screen in that and you can get whole idea of it.

Thanks in Advance.

boazsegev commented 6 years ago

Hi @bharat303 ,

Thank you for commenting on this issue and pointing out that the current solution might not be working for you.

Can you open a new issue for this? This thread is becoming busy and although the results are related, the cause issue might be different.

If there's a chance to email me a file that I can test, that would be great and I promise not to share it (my email is listed in the gemspec file.

As for opening a chat or a hangout, it might be a while. These upcoming two weeks are stormy.

Thank you! B.