dictadata / pdf-data-parser

Parse, search and stream PDF tabular data using Node.js with Mozilla's PDF.js library.
MIT License
2 stars 0 forks source link

Promise-related crashes (pdf.js) #1

Closed Matojeje closed 6 months ago

Matojeje commented 6 months ago

Hello, I tried running the CLI on multiple systems and no matter the input file, the output always looks something like this:

TypeError: Promise.withResolvers is not a function
    at new PDFDocumentLoadingTask (file:///home/pi/Downloads/node-v20.10.0-linux-armv6l/lib/node_modules/pdf-data-parser/node_modules/pdfjs-dist/build/pdf.mjs:2957:32)
    at Module.getDocument (file:///home/pi/Downloads/node-v20.10.0-linux-armv6l/lib/node_modules/pdf-data-parser/node_modules/pdfjs-dist/build/pdf.mjs:2771:16)
    at PdfDataParser.parse (/home/pi/Downloads/node-v20.10.0-linux-armv6l/lib/node_modules/pdf-data-parser/lib/PdfDataParser.js:75:34)
node:internal/process/promises:289
            triggerUncaughtException(err, true /* fromPromise */);
            ^

TypeError: Promise.withResolvers is not a function
    at new PDFDocumentLoadingTask (file:///home/pi/Downloads/node-v20.10.0-linux-armv6l/lib/node_modules/pdf-data-parser/node_modules/pdfjs-dist/build/pdf.mjs:2957:32)
    at Module.getDocument (file:///home/pi/Downloads/node-v20.10.0-linux-armv6l/lib/node_modules/pdf-data-parser/node_modules/pdfjs-dist/build/pdf.mjs:2771:16)
    at PdfDataParser.parse (/home/pi/Downloads/node-v20.10.0-linux-armv6l/lib/node_modules/pdf-data-parser/lib/PdfDataParser.js:75:34)

Node.js v20.10.0

Running this on different Node versions or operating systems would only change the displayed filepaths somewhat, but the stack trace stays the same.

The problem seems to be upstream in the pdf.js library (namely this bit of code), where this pull request created this issue.

I temporarily got around it by manually editing pdf-data-parser's package.json to require an earlier version than pdfjs-dist@4.1.392 where this issue started to appear.

drewletcher commented 6 months ago

Fixed the issue by using the legacy version of pdf.js.

In lib/PdfDataParser.js:

      const { getDocument } = await import("pdfjs-dist/legacy/build/pdf.mjs");     

Odd that it was working when testing on Windows 11, but when I installed the package on Rocky Linux I ran into the same issue with Node.js not supporting Promise.withResolvers, yet.

Published pdf-data-parser 1.2.9 with the fix.