boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
734 stars 156 forks source link

CombinePDF :: ParsingError (Unknown PDF parsing error - malformed PDF file?) #156

Closed IrineuFilho closed 3 years ago

IrineuFilho commented 5 years ago

Hi

I have a problem to load some types of PDF using version 1.0.16 (Ruby 2.6.2 - Mac OS Mojave (dev) or Ubuntu 14.04(prod)) (I think when the PDF is created by a multifunction printer).

When I try to load, I get this error "CombinePDF :: ParsingError (Unknown PDF parsing error - malformed PDF file?)".

Checking the code, more specific in parser.rb, line 81 the block (@parsed.select {| i |! I.is_a? (Hash)}) is not empty, but when I print the @parsed variable, the first element of Array is just 0 (zero).

I created a fork and before the block mentioned above, I verify if the first element of @parse is 0. If true, I just remove the first element and every thing work normally. err.pdf

What is happaning to get this behavior? I'm send a sample of PDF with this problem.

thank for your attention

boazsegev commented 5 years ago

Hi @IrineuFilho ,

Thank you for opening this issue.

My initial observation is that the PDF file is actually malformed.

The PDF version shows as 1.2 (an older PDF format). After the PDF version there's an illegal line of data: ?\x08\x00\x04\x00 ... -?\x08\x00

This line isn't PDF valid data, as far as I can tell. The parser reads the first line with undefined results (basically anything might happen).

The line you pointed out to (line 81) tests that no uncaught errors occurred and no "dangling data" was read.

In this case, the error message is correct, the PDF really is malformed.

I don't know if I can solve this without breaking valid PDF files, but I will tinker around and see what I can do.

Thanks again for opening this issue.

Kindly, Bo.

IrineuFilho commented 5 years ago

thanks for the attention,

Since I need to work with these PDFs, I'm going to do this before line 81 from parse.rb

if @parsed[0] == 0
    @parsed.delete(0)
end

All PDFs generated in version 1.2 come from scanners, so I'm sure that my workaround will not change the contents of the documents.

att,

Irineu

adnanmirza1 commented 3 years ago

@IrineuFilho @boazsegev please help. Hey, i'm facing the issue. i'm trying to parse PDF stored in my active storage. with url_for and facing this error CombinePDF::ParsingError Exception: Unknown PDF parsing error - malformed PDF file?

i'm using following syntax: url = url_for(Candidate.last.pdf) pdf = CombinePDF.parse(Net::HTTP.get_response(URI.parse(url)).body)

Subtletree commented 3 years ago

@adnanmirza1 Do you get the same error with this?

CombinePDF.parse(Candidate.last.pdf.download)
adnanmirza1 commented 3 years ago

Hey, With this: CombinePDF.parse(Candidate.last.pdf.download)

I get,  NoMethodError Exception: undefined method `download' for # On Thursday, February 11, 2021, 07:47:17 AM GMT+5, Ryan Scott notifications@github.com wrote:

@adnanmirza1 Do you get the same error with this? CombinePDF.parse(Candidate.last.pdf.download)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

adnanmirza1 commented 3 years ago

Hey, @Subtletree With this: CombinePDF.parse(Candidate.last.pdf.download)

I get,

NoMethodError Exception: undefined method `download' for #

Subtletree commented 3 years ago

@adnanmirza1 Hmm think download should be defined https://edgeguides.rubyonrails.org/active_storage_overview.html#downloading-files

adnanmirza1 commented 3 years ago

Hey! so i solved it myself by using temp file. Like this: @cans.pdf_doc.open(tmpdir: "app/assets/pdfs") do |file| pdf = CombinePDF.new a = File.basename(file) pdf << CombinePDF.load("app/assets/pdfs/#{a}") pdf.save "combined.pdf" end

Btw thanks @Subtletree for help.
ionosphere commented 3 years ago

Hi, I face the same error Unknown PDF parsing error - malformed PDF file? under Ubuntu 18.04 / Ruby 2.6.6 / Rails 5.0.7

boazsegev commented 3 years ago

Hi @ionosphere ,

Thank you for your comment.

The error message you're seeing is a generic parsing error message. If you want me to dig into it, please open a new issue and attach (or send me privately) a PDF that raises this error along with the example code that raises the error (if special).

Kindly, Boaz Segev.