galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Unable to start parsing PDF file #179

Open SAanish opened 6 years ago

SAanish commented 6 years ago

var pdfReader = hummus.createReader(sourcePath); pageNumber=pdfReader.getPagesCount()

galkahana commented 6 years ago

maybe the path is wrong? maybe its not a pdf? this is fairly basic stuff

Jackychans commented 6 years ago

Run into the same issue with this pdf file. Please help

Path is correct. Only issue potentially from the pdf itself tempDoc.pdf

Looking forward to any advise.

yogalink commented 6 years ago

Hello, i run into the same error 👍

In my case it was observed only on pdf version 1.3, however as jackychans shows us it's also for 1.7

Same case, path and data are correct, it comes from hummus.createReader() on nodejs.

galkahana commented 6 years ago

you'll need to send the PDF if you want it debugged

galkahana commented 6 years ago

@Jackychans tempDoc.pdf has got a header which is not PDF. remove all the part up to %PDF-1.7 (not including) and the file should parse fine.

Jackychans commented 6 years ago

Thanks @galkahana ga for response although it's not pretty fast, hehe.

I had found it wrong in the header of the file just after posting issue here.

Again, thanks

zerobytes commented 4 years ago

@Jackychans tempDoc.pdf has got a header which is not PDF. remove all the part up to %PDF-1.7 (not including) and the file should parse fine.

You say the header is not PDF, however any PDF reader will open the file normally. So i would assume the lib show either ignore the thinks it doesn't "care" or replace them, as it is nearly impossible to predict what will come inside the file that hummus doesn't want, considering a file that works everywhere else.

Let's say i go to google docs and generate a file, and it comes with something on its header. It will open anywhere, but my program, because hummus does not support it somehow.

Solid-Metal commented 3 years ago

same, i got this error with 4 different pdf...

untrustedlifeswanleap commented 3 years ago

I have had this error with every pdf ive tested with and they all have properly formatted headers, I think something is wrong with the currently released version of hummus

FranklinThaker commented 2 years ago

Hummus is declining some PDFs as they're not according to PDF standards. Check your PDF here -> https://www.pdfen.com/pdf-a-validator We might have to convert PDF according to standard in catch block if we receive the same parsing error from Hummus.

FranklinThaker commented 2 years ago

Finally, I've created a solution here. https://stackoverflow.com/questions/69039978/hummus-recipe-npm-typeerror-unable-to-start-parsing-pdf-file/69040034#69040034