UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)
https://github.com/UglyToad/PdfPig/wiki
Apache License 2.0
1.73k stars 241 forks source link

Unhandled exception. UglyToad.PdfPig.Core.PdfDocumentFormatException: Could not find the object 4 0 with type DictionaryToken #648

Closed yjagota closed 9 months ago

yjagota commented 1 year ago

Hello. Just found an exception which is pretty similar to...

UglyToad.PdfPig.Core.PdfDocumentFormatException
  HResult=0x80131500
  Message=Could not find the object number 4 0 with type DictionaryToken instead, it was found with type ObjectToken.
  Source=UglyToad.PdfPig
  StackTrace:
   at UglyToad.PdfPig.Parser.Parts.DirectObjectFinder.Get[T](IndirectReference reference, IPdfTokenScanner scanner)
   at UglyToad.PdfPig.Parser.Parts.DirectObjectFinder.Get[T](IToken token, IPdfTokenScanner scanner)
   at UglyToad.PdfPig.Content.ResourceStore.LoadFontDictionary(DictionaryToken fontDictionary, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.Content.ResourceStore.LoadResourceDictionary(DictionaryToken resourceDictionary, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.Parser.PageFactory.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, NamedDestinations namedDestinations, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, NamedDestinations namedDestinations, InternalParsingOptions parsingOptions)
   at UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber)
   at Dexttra.Parser.Pdf.PdfParser..ctor(String fileName) in D:\Projects\Dexttra\Dexttra.Parser\Pdf\PdfParser.cs:line 25
   at Dexttra.Admin.ViewModels.PdfParsingViewModel.DocumentLoaded(Object sender, EventArgs args) in D:\Projects\Dexttra\Dexttra.Admin\ViewModels\PdfParsingViewModel.cs:line 230

Version 0.1.8 Windows 11 22621.1848

Attached the pdf which is causing the error.

Thanks & Regards, Yogesh

CRYSTAL REAY ROAD.pdf

bo100nka commented 10 months ago

i have a similar problem but with following UglyToad.PdfPig.Core.PdfDocumentFormatException with message instead: Could not find the object number 4 0 with type StreamToken instead, it was found with type ObjectToken.

win11 version 0.1.8, vs2022 with .net 8 project

pdf is a bank statement document containing sensitive data so i can't upload it here, hope the message narrows down the problem though.

   at UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber)
   at UglyToad.PdfPig.PdfDocument.<GetPages>d__31.MoveNext()
yjagota commented 9 months ago

Can somebody please look into this. Been 7 months since I posted this issue.

@everharder If you can help that will be great, as you resolved #208 .

BobLd commented 9 months ago

@yjagota are you sure that the issue is still there? Can you try with the latest nightly release (0.1.9-alpha-20240121-04fc8)

yjagota commented 9 months ago

Hello @BobLd. Thanks for replying

No, I tested this using Tamworth/1.8.0. As there was no reference or comment on this, I thought the issue has not been fixed. Tested with the nightly 0.1.9-alpha-20240121-04fc8, and the issue is resolved.

Much appreciated. Regards.

BobLd commented 9 months ago

@yjagota no problem at all, I just did a test and saw there was no more issues.

I'm going to close the issue, but feel free to re-open it or create a new one if need be