Hopding / pdf-lib

Create and modify PDF documents in any JavaScript environment
https://pdf-lib.js.org
MIT License
7k stars 676 forks source link

Ignoring parsing unreferenced objects #1641

Open rambo-panda opened 5 months ago

rambo-panda commented 5 months ago

What were you trying to do?

I want to get the number of pages in a PDF, but the count I am getting differs from what is shown in the PDF preview.

How did you attempt to do it?

var buf = await fetch("https://17zy-oto-export.oss-cn-beijing.aliyuncs.com/saaszy-2b589de8650eccdfddf32b8ec2c65fda/resource/record/production/2024/06/24/2021%E5%B0%8F%E5%AD%A6%E4%BA%8C%E5%B9%B4%E7%BA%A7%E6%9C%9F%E6%9C%AB%E8%AF%AD%E6%96%87%E6%B0%B4%E5%B9%B3%E6%B5%8B%E8%AF%95%E5%8D%B71_1719221446240425384.pdf").then(r => r.arrayBuffer());

var pdf = await require("pdf-lib").PDFDocument.load(buf);

console.log(pdf.getPageCount()); // 1    -- The actual count should be 2.

What actually happened?

I retrieved the page count of the PDF through the API as 1, but in reality, the actual number of pages is 2.

What did you expect to happen?

I hope to correctly return 2.

During the debugging process, I found that an unreferenced object was parsed, causing it to overwrite the correct Pages value.

Unreferenced object Unreferenced object

Correct page information WeChatWorkScreenshot_6eb0302a-fa1c-468f-ad46-0bc7cbff83a9

How can we reproduce the issue?

var buf = await fetch("https://17zy-oto-export.oss-cn-beijing.aliyuncs.com/saaszy-2b589de8650eccdfddf32b8ec2c65fda/resource/record/production/2024/06/24/2021%E5%B0%8F%E5%AD%A6%E4%BA%8C%E5%B9%B4%E7%BA%A7%E6%9C%9F%E6%9C%AB%E8%AF%AD%E6%96%87%E6%B0%B4%E5%B9%B3%E6%B5%8B%E8%AF%95%E5%8D%B71_1719221446240425384.pdf").then(r => r.arrayBuffer());

var pdf = await require("pdf-lib").PDFDocument.load(buf);

console.log(pdf.getPageCount());

Version

1.17.1

What environment are you running pdf-lib in?

Node

Checklist

Additional Notes

No response