electrovir / pdf-text-reader

Dead simple pdf text reader
https://electrovir.github.io/pdf-text-reader/
Creative Commons Zero v1.0 Universal
27 stars 1 forks source link

ESM support #11

Closed HasmH closed 2 months ago

HasmH commented 2 months ago

Documentation: https://nodejs.org/api/esm.html#mandatory-file-extensions

Originally posted by @AaronSterlingGENEICD in https://github.com/electrovir/pdf-text-reader/issues/10#issuecomment-2099444014

Using Node v22:

 node
Welcome to Node.js v22.1.0.
Type ".help" for more information.
> import('pdf-text-reader')
Promise {
  <pending>,
  [Symbol(async_id_symbol)]: 25,
  [Symbol(trigger_async_id_symbol)]: 6
}
> Uncaught:
Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/home/hasamh/dev/open-source/zzz/node_modules/pdf-text-reader/dist/read-pdf' imported from /home/hasamh/dev/open-source/zzz/node_modules/pdf-text-reader/dist/index.js
    at finalizeResolution (node:internal/modules/esm/resolve:264:11)
    at moduleResolve (node:internal/modules/esm/resolve:924:10)
    at defaultResolve (node:internal/modules/esm/resolve:1148:11)
    at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:541:12)
    at ModuleLoader.resolve (node:internal/modules/esm/loader:510:25)
    at ModuleLoader.getModuleJob (node:internal/modules/esm/loader:240:38) {
  code: 'ERR_MODULE_NOT_FOUND',
  url: 'file:///home/hasamh/dev/open-source/zzz/node_modules/pdf-text-reader/dist/read-pdf'
}
>

To reproduce:

npm init -y
npm install pdf-text-reader #v5.0.0 installed 
#dynamic import of pdf-text-reader
electrovir commented 2 months ago

Woops, sorry! I'm still new to ESM support in Node.js.

electrovir commented 2 months ago

Despite my TS setup, which is surprisingly resilient to this error, I was able to write a test that fails based on your repro case: https://github.com/electrovir/pdf-text-reader/blob/c33bcfc4d698552bb064f233e3c5774292e106e6/src/esm-support.test.ts#L6

I've now fixed that test and deployed that fix in v5.0.1. Please try that version out!

AaronSterlingGENEICD commented 2 months ago

Thanks very much. I ran into an issue with the new version of pdfjs-dist that I'll write here in case it helps someone.

pdfjs-dist v4 uses Promise.withResolvers, which is not supported by Node until version 22. AWS Lambda will not add Node 22 runtime until November 2024. To continue using this library on supported runtimes, we downgraded back to pdf-text-reader@4 and added a line into read-pdf.js:

async function readPdfPages({ data, filePath, password, pathToPdfJsDistNodeModule, progressCallback, url, }) {
    const documentLoadingTask = (0, pdfjs_dist_1.getDocument)({
        data,
        isEvalSupported: false, <-- new line here
        url: url || filePath,
        useSystemFonts: true,
        password,
        standardFontDataUrl: pathToPdfJsDistNodeModule
            ? (0, path_1.join)(pathToPdfJsDistNodeModule, 'standard_fonts')
            : undefined,
    });

That workaround was suggested in the vulnerability report, so I believe everything is fine from a security perspective.

Thanks again for addressing this so quickly. We'll upgrade to v5 as soon as we can.

electrovir commented 2 months ago

I ran into the same issue with Node versions. I can add an options input to readPdfPages so you can pass in isEvalSupported without modifying the source code.

electrovir commented 2 months ago

Actually, looks like I already have that, but it's not being used, woops!

electrovir commented 2 months ago

I'll fix here: https://github.com/electrovir/pdf-text-reader/issues/12