Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
6.83k stars 652 forks source link

getFields() results in Expected instance of PDFDict, but got instance of PDFInvalidObject #1406

Open sootsnoot opened 1 year ago

sootsnoot commented 1 year ago

What were you trying to do?

I have an html5 form generated from this pdf by this online converter.

I'm using this converter tool so that I can host it on my website with error-checking from jquery form-validator plugin , and users can fill out the form fields and submit the form back to my site. The form has buttons to submit and to download the pdf with the fields they filled in. Everything works fine, but the pdf they download still has fillable fields in it, and I don't want the users to be able to make changes in the pdf directly, I want them to go back to the web site to make changes, and download again to pick up changes. The javascript code that creates the pdf to be downloaded is part of what the idrsolutions tool generates. But I modified that code to hook in pdf-lib to take the in-memory pdf created by the idrsolutions javascript code in order to draw text over the signature fields in custom handwriting fonts that they can select on the form. This also works fine after much trial and error getting the x and y coordinates correct. So I've hooked up pdf-lib correctly, and it is basically able to handle the pdf created by the idrsolutions code.

I was thinking I needed to iterate over the fields in the pdf using form.getFields() and set each one to readonly using PDFField.enableReadOnly(). But the call to form.getFields() produced the error in Chrome devtools console:

pdf-lib.js:8197 Uncaught (in promise) Error: Expected instance of PDFDict, but got instance of PDFInvalidObject at new UnexpectedObjectTypeError (pdf-lib.js:8197:28) at PDFContext.lookup (pdf-lib.js:16318:19) at PDFArray.lookup (pdf-lib.js:15451:47) at PDFAcroForm.getFields (pdf-lib.js:28962:35) at PDFAcroForm.getAllFields (pdf-lib.js:28981:29) at PDFForm.getFields (pdf-lib.js:35505:43)

Then I read that flattening the pdf would make all fields readonly, so I tried calling form.flatten(). But that produced the same error, as flatten() calls getFields() internally:

at PDFForm.getFields (pdf-lib.js:35505:43) at PDFForm.updateFieldAppearances (pdf-lib.js:35955:31) at PDFForm.flatten (pdf-lib.js:35867:22) at OSH.initPDF (init-combineddriving.js:779:16)

The pdf that gets downloaded after these errors looks fine, but it still contains fillable fields. I also downloaded the pdftk tool, and ran pdftk downloaded.pdf dump_data_fields, and it had no complaints and dumped out what looked like all of the fillable fields (output attached) with their names, types, and values. ADSentry-fieldnames.txt

How did you attempt to do it?

I'm afraid I provided all of this information in the previous answer.

What actually happened?

Again, answered in first question.

What did you expect to happen?

I expected that the downloaded PDF would not have any fillable fields.

How can we reproduce the issue?

Clearly the path names need to be changed, both for the require of pdf-lib.js, and for the pdf files to process, in order for you to run the code in your environment. I've uploaded pdflibtest.zip, which contains the testcase code pdflibtest.js, as well as two pdf files to process, and a third pdf file which is produced by running pdflibtest.js on combineddriving-good.pdf.

  1. combineddriving-good.pdf is the original pdf downloaded from https://americandrivingsociety.org/content.aspx?page_id=22&club_id=548049&module_id=413590. When this file is used, everything is fine, and the testcase code downloads a file combineddriviing-flattened.pdf with the fillable fields flattened.
  2. combineddriving-bad.pdf is the pdf generated by filling in fields on the html5 form generated by https://www.idrsolutions.com/online-pdf-to-html5-converter, after filling in some fields. This pdf looks fine in a pdf reader, pdftk can list all the fields in it and you can see the signatures in custom fonts that were produced using pdflib. But feeding it to the testcase code produces this output:

Hello world! Trying to parse invalid object: {"line":1918,"column":627,"offset":135069}) Invalid object ref: 54 0 R C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:8197 _this = _super.call(this, msg) || this; ^ ` Error: Expected instance of PDFDict, but got instance of PDFInvalidObject at new UnexpectedObjectTypeError (C:\xampp1826\htdocs combineddriving-bad.pdf combineddriving-flattened.pdf combineddriving-good.pdf \WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:8197:28) at PDFContext.lookup (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:16318:19) at PDFArray.lookup (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:15451:47) at PDFAcroForm.getFields (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:28962:35) at PDFAcroForm.getAllFields (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:28981:29) at PDFForm.getFields (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:35505:43) at PDFForm.updateFieldAppearances (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:35955:31) at PDFForm.flatten (C:\xampp1826\htdocs\WWW\public\js\OSH-dev\ADS-forms\pdf-lib.js:35867:22) at loadPdf (C:\xampp1826\htdocs\WWW\public\images\ADS-forms\Pristine\pdflibtest.js:10:16)`

Note that the first error indicates a problem with the structure of the pdf file generated directly by the idrsolutions "Save as" button, which occurs whether or not any pdflib code is used, but that pdf viewers as well as pdftk have no problem dealing with the file.

This is not really a blocking problem for me, as I believe I can bypass the idrsolutions code entirely, and use pdflib to insert the field values directly into the original pdf, which pdflib can load without error. I just thought that this was a problem that pdflib should be able to handle, given that pdftk does.

pdflibtest.zip

Version

https://unpkg.com/pdf-lib@1.17.1/dist/pdf-lib.js

What environment are you running pdf-lib in?

Browser

Checklist

Additional Notes

I'm running in a browser, though the SSCCE is obviously for node.

cstayyab commented 1 year ago

Can reproduce on latest version with a slightly different error:

Uncaught (in promise) Error: Expected instance of PDFDict, but got instance of PDFNull
    at new UnexpectedObjectTypeError (lib.js?ts=1690293969440:127441:28)
    at PDFContext.lookup (lib.js?ts=1690293969440:135562:19)
    at PDFArray.lookup (lib.js?ts=1690293969440:134695:47)
    at PDFAcroForm.getFields (lib.js?ts=1690293969440:148206:35)
    at PDFAcroForm.getAllFields (lib.js?ts=1690293969440:148225:29)
    at PDFForm.getFields (lib.js?ts=1690293969440:154749:43)
    at PDFForm.flatten (lib.js?ts=1690293969440:155113:31)
    at Object.<anonymous> (lib.js?ts=1690293969440:159350:19)
    at Function.each (lib.js?ts=1690293969440:51:5347)
    at Object.<anonymous> (lib.js?ts=1690293969440:159315:8)
cstayyab commented 1 year ago

@sootsnoot Found any solution to this? If so, please let me know as well.

ericpias commented 11 months ago

I have encountered the exact same exception on the fillable pdf that I am attaching. I am just calling getFields() on a PDFForm. I have run it through several online pdf validators and it is fine according to them (https://www.pdf-online.com/osa/validate.aspx for one example). The exception is:

Uncaught (in promise) Error: Expected instance of PDFDict, but got instance of PDFNull at new UnexpectedObjectTypeError (errors.ts:31:1) at PDFContext.lookup (PDFContext.ts:166:1) at PDFArray.lookup (PDFArray.ts:107:1) at PDFAcroForm.getFields (PDFAcroForm.ts:39:1) at PDFAcroForm.getAllFields (PDFAcroForm.ts:61:1) at PDFForm.getFields (PDFForm.ts:142:1)

pdf-lib 1.17.1 Seems like a pdf-lib bug but unsure. Any ideas? signature_example.pdf Thanks.