Closed IrineuFilho closed 3 years ago
Hi @IrineuFilho ,
Thank you for opening this issue.
My initial observation is that the PDF file is actually malformed.
The PDF version shows as 1.2 (an older PDF format). After the PDF version there's an illegal line of data: ?\x08\x00\x04\x00 ... -?\x08\x00
This line isn't PDF valid data, as far as I can tell. The parser reads the first line with undefined results (basically anything might happen).
The line you pointed out to (line 81) tests that no uncaught errors occurred and no "dangling data" was read.
In this case, the error message is correct, the PDF really is malformed.
I don't know if I can solve this without breaking valid PDF files, but I will tinker around and see what I can do.
Thanks again for opening this issue.
Kindly, Bo.
thanks for the attention,
Since I need to work with these PDFs, I'm going to do this before line 81 from parse.rb
if @parsed[0] == 0
@parsed.delete(0)
end
All PDFs generated in version 1.2 come from scanners, so I'm sure that my workaround will not change the contents of the documents.
att,
Irineu
@IrineuFilho @boazsegev please help. Hey, i'm facing the issue. i'm trying to parse PDF stored in my active storage. with url_for and facing this error CombinePDF::ParsingError Exception: Unknown PDF parsing error - malformed PDF file?
i'm using following syntax: url = url_for(Candidate.last.pdf) pdf = CombinePDF.parse(Net::HTTP.get_response(URI.parse(url)).body)
@adnanmirza1 Do you get the same error with this?
CombinePDF.parse(Candidate.last.pdf.download)
Hey, With this: CombinePDF.parse(Candidate.last.pdf.download)
I get, NoMethodError Exception: undefined method `download' for #
@adnanmirza1 Do you get the same error with this? CombinePDF.parse(Candidate.last.pdf.download)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Hey, @Subtletree With this: CombinePDF.parse(Candidate.last.pdf.download)
I get,
NoMethodError Exception: undefined method `download' for #
@adnanmirza1 Hmm think download
should be defined https://edgeguides.rubyonrails.org/active_storage_overview.html#downloading-files
Hey! so i solved it myself by using temp file. Like this: @cans.pdf_doc.open(tmpdir: "app/assets/pdfs") do |file| pdf = CombinePDF.new a = File.basename(file) pdf << CombinePDF.load("app/assets/pdfs/#{a}") pdf.save "combined.pdf" end
Btw thanks @Subtletree for help.
Hi, I face the same error Unknown PDF parsing error - malformed PDF file? under Ubuntu 18.04 / Ruby 2.6.6 / Rails 5.0.7
Hi @ionosphere ,
Thank you for your comment.
The error message you're seeing is a generic parsing error message. If you want me to dig into it, please open a new issue and attach (or send me privately) a PDF that raises this error along with the example code that raises the error (if special).
Kindly, Boaz Segev.
Hi
I have a problem to load some types of PDF using version 1.0.16 (Ruby 2.6.2 - Mac OS Mojave (dev) or Ubuntu 14.04(prod)) (I think when the PDF is created by a multifunction printer).
When I try to load, I get this error "CombinePDF :: ParsingError (Unknown PDF parsing error - malformed PDF file?)".
Checking the code, more specific in parser.rb, line 81 the block
(@parsed.select {| i |! I.is_a? (Hash)})
is not empty, but when I print the @parsed variable, the first element of Array is just 0 (zero).I created a fork and before the block mentioned above, I verify if the first element of @parse is 0. If true, I just remove the first element and every thing work normally. err.pdf
What is happaning to get this behavior? I'm send a sample of PDF with this problem.
thank for your attention