empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

Incorrect number of pages found in a particular pdf. #76

Closed zaryk closed 1 year ago

zaryk commented 5 years ago

Reporting an Issue Here

Incorrect number of pages found in a particular pdf.

More Info: Pages aren't being detected in specific pdf. It only brings back 1 page, but there are 26 pages. I have tried debugging. It looks like the issue is related to the retrieval of iref. Based on what I am seeing, 8 0 R is the 1 page it is retrieving, then I believe 123 0 R are the additional pages which has kids inside of kids, but it looks like the iref value is null so it is removed during removetrailing. As a result, PageCount is 1 and number of page elements is 1.

Expected Behavior

Should find 26 pages. When saving, should still have 26 pages.

Actual Behavior

Finds 1 page. If saved, will only have 1 page.

Steps to Reproduce the Behavior

Providing IssueSubmission.

GDI: IssueSubmission.zip

PDFXplorer layout: 1-8-2019 12-00-38 pm

using podofobrowser as an example of what PDFSharp is finding: 1-8-2019 12-02-41 pm

using podofobrowser as an example of what PDFSharp is finding after saving in Adobe Reader 1-8-2019 12-02-46 pm :

bhanuteja-dev commented 5 years ago

I am having 190 page pdf that i need to merge to another PDF. but at the line of Dim pages As PdfDocument = PdfReader.Open(memStream, PdfDocumentOpenMode.Import) I get PageCount as 1.

memStream Length is 1169442.

I am using 1.50 stable version.

ThomasHoevel commented 1 year ago

Still occurs with PDFsharp 6.0.0-preview-2. Must be investigated. Thanks for the feedback.

ThomasHoevel commented 1 year ago

The document is not "comme il faut". The "/Pages" object with ID "3 0" exists twice: There is one version that references "8 0 R" only, there is one version that references "8 0 R" and "123 0 R". Up to version 6.0.0-preview-2, PDFsharp used the version that references "8 0 R" only. Finding only 1 page was the consequence. Starting with 6.0.0-preview-3 (ETA not yet set), PDFsharp uses the other "/Pages" object and now finds 26 pages for the file discussed here. PDF files should not have duplicate objects with the same ID, so IMHO both 1 page and 26 pages are correct results. But coming up with 26 pages is more logical with the file we have here. Thank you for providing the PDF file. Understanding and fixing the issue was a bit difficult.