TypeError: _this.catalog.Pages(...).traverse is not a function

schester44 commented 2 years ago

What were you trying to do?

I am trying to load a 90 page PDF into the lib

How did you attempt to do it?

Here is a simple reproduction of the issue

const { PDFDocument } = require("pdf-lib");
const fs = require("fs");

const fileWithError = fs.readFileSync("./policy-doc-test.pdf");

async function main() {
  const parentPDFDoc = await PDFDocument.load(fileWithError);

  console.log(parentPDFDoc.getPageCount());
}

main();

What actually happened?

I am getting the TypeError: _this.catalog.Pages(...).traverse is not a function error anytime I call any APIs that require traversing the pages. This includes getPageCount, save, etc.

What did you expect to happen?

I expected these functions to work as expected.

How can we reproduce the issue?

Run the above code snippet using node

Version

1.17.1

What environment are you running pdf-lib in?

Node

Checklist

[X] My report includes a Short, Self Contained, Correct (Compilable) Example.
[X] I have attached all PDFs, images, and other files needed to run my SSCCE.

Additional Notes

Above is the code snippet for reproducing the issue. The document is a somewhat sensitive PDF so i'd prefer to not attach it here publicly. I can attach the PDF via a DM or email.

Some more context:

This is a 90 page document (3.8MB). Opening it in Acrobat causes an error in acrobat. not sure if its related but I suspect it could be.

Here's the fun part... re-exporting this file and opening it with pdf-lib works as expected so Acrobat is doing something that fixes the issue, just not sure what and unfortunately re-exporting through acrobat isn't an option given the task.

Here to see if anyone knows what may be going on and how to potentially fix this issue. Thanks!

kausthubmayuram commented 2 years ago

Facing the same issue. My PDF is around 60 to 70 pages

alecimackay commented 2 years ago

Same issue here, is there any time frame on when this will be looked into/fixed

gmayc commented 1 year ago

We are experiencing the same issue, any news on this? glad to help any way I can

kubarozycki commented 1 year ago

Same issue, any chance to fix this soon?

StarNumber12046 commented 1 year ago

no fix

msquitieri commented 1 year ago

We are also experiencing this issue with a specific PDF.

bspot commented 1 year ago

I also stumbled over this.

In my case, the reason was that the /Pages dict doesn't have /Type set to /Pages. That caused the PDF parser to instantiate the object as a plain PDFDict instead of a PDFPageTree.

I was successful with the following workaround:

  const pdfDoc = await PDFDocument.load(bytes)

  // Find reference to the page tree
  const pagesRef = pdfDoc.catalog.get(PDFName.of('Pages'))

  // Get the page tree. This is a PDFDict.
  const oldPageTree = pdfDoc.context.indirectObjects.get(pagesRef)

  // Create a PDFPageTree with the same content.
  const newPageTree = new PDFPageTree(oldPageTree.dict, oldPageTree.context)

  // Set the correct `Type`.
  newPageTree.dict.set(PDFName.of('Type'), PDFName.of('Pages'));

  // Replace the PDFDict with the PDFPageTree in the document.
  pdfDoc.context.indirectObjects.set(pagesRef, newPageTree)

  // Save fixed document
  ...

chebum commented 11 months ago

In my case the PDFDocument.catalog property was initialised with a PDFDict instead of a PDFCatalog. So here is my workaround for the bug:

const doc = await PDFDocument.load(bytes, { ignoreEncryption: true });
if (!(doc.catalog instanceof PDFCatalog) && ((doc.catalog as any) instanceof PDFDict)) {
    (doc as any).catalog = PDFCatalog.fromMapWithContext(doc.catalog, doc.context);
}

nvutri commented 8 months ago

For me it wasn't working due to Catalog pointing to the wrong object. I did this to manually point Catalog to a PDFPageTree

let pdfPageTree;

for (const entry of pdfDoc.context.indirectObjects.entries()) {
  const [ref, obj] = entry;
  if (obj instanceof pdfLib.PDFPageTree) {
    pdfPageTree = obj;
    break;
  }
}

doc.catalog = pdfLib.PDFCatalog.withContextAndPages(pdfDoc.context, pdfPageTree);

Hopding / pdf-lib