Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
6.85k stars 656 forks source link

PDFDocument undefined #1148

Open fharper opened 2 years ago

fharper commented 2 years ago

What were you trying to do?

I'm loading a existing PDF using base64.

How did you attempt to do it?

Using this code:

let pdfDocument = await PDFDocument.load(base64, {
    ignoreEncryption: true
});
await pdfDocument.save({updateFieldAppearances: false})

console.log("PDF: " + pdfDocument.PDFDocument);

What actually happened?

It's giving me an undefined on two documents only. If I open them in a viewer and save them, they works, so there is definitely something wrong with those.

The documents are not protected, I can view them in multiple PDF viewer, the mime type is right, I have no headers error message, and they don't seem corrupted (I tried with more than one tool). I also tried with another library with Python since I known the library, and it is working when I try to do the equivalent.

They are PDF 1.7 format, with no Acrobat Form nor XFA ones. The only common thing with those two documents is that they were generated using "HP Exstream Version 9.0.107 64-bit"... I'm trying to check on this side also to see what could be different...

Here is the error I get:

(node:5229) UnhandledPromiseRejectionWarning: Error: Expected instance of PDFDict, but got instance of undefined
    at new UnexpectedObjectTypeError (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/core/errors.js:38:24)
    at PDFContext.lookup (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/core/PDFContext.js:95:15)
    at PDFCatalog.PDFDict.lookup (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/core/objects/PDFDict.js:65:48)
    at PDFCatalog.Pages (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/core/structures/PDFCatalog.js:14:21)
    at Cache.PDFDocument.computePages [as populate] (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/api/PDFDocument.js:28:27)
    at Cache.access (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/utils/Cache.js:13:31)
    at PDFDocument.getPages (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/api/PDFDocument.js:479:31)
    at PDFDocument.getPageCount (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/api/PDFDocument.js:463:35)
    at PDFDocument.<anonymous> (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/pdf-lib/cjs/api/PDFDocument.js:1253:52)
    at step (/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/devrel/sdks/testing/node.js/node_modules/tslib/tslib.js:141:27)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:5229) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 5)
(node:5229) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I also got these errors, but from what I've read in other issues, it shouldn't worry about these:

Trying to parse invalid object: {"line":34,"column":6,"offset":9389})
Invalid object ref: 10 0 R
Trying to parse invalid object: {"line":77,"column":6,"offset":13551})
Invalid object ref: 17 0 R
Trying to parse invalid object: {"line":114,"column":6,"offset":15684})
Invalid object ref: 23 0 R

What did you expect to happen?

Being able to access the PDFDocument object to manipulate the PDF file.

How can we reproduce the issue?

Unfortunately, I cannot share the PDFs since they contains private informations from a customer, and I cannot edit them to remove or hide the information as once I save the document, it's like a new one and the problem doesn't happen anymore.

With those documents, I get this error all the time. Here I'm looking more for pointers on how to find the issues, or ideas of what could be wrong with these PDFs, since I unfortunately cannot share them...

Version

1.17.1

What environment are you running pdf-lib in?

Node

Checklist

Additional Notes

Anything else I should check that could help identify what is the problem with these PDFs? Is there other reasons than the one I listed (and checked, based on other issues) that could cause this problem? Any help appreciated :)

fharper commented 2 years ago

I debugged directly the code of pdf-lib and got this error: Exception has occurred: Error: /Fl stream encoding not supported

Need to investigate a bit more to see if it's a real encoding that isn't really supported, or an encoding error in the file itself (not an PDF expert).

arvo95 commented 2 years ago

Hello, @fharper!

I have come across a similar issue, but it is even weirder - this error appears only during Cypress tests when mocking an API call for a PDF file. Have you found out what was causing this error? I thought perhaps compression is the issue (it causes more of those Invalid object ref errors), but it seems to be happening anyways on uncompressed PDFs as well.

fharper commented 2 years ago

@arvo95 I wasn't able to dedicate more time for that, sorry.