Closed M4nju closed 1 year ago
The exception that you are getting it seems to suggest that your PDF document is malformed. Since you can't share the PDF, could you test document with this UglyToad.PdfPig.0.1.5-alpha001.zip
I should clarify, that it should still throw an exception. I'm more interested in the error message.
I am realy sorry that i didnt responded, somehow missed the github email. I will test it today and come back to you with the error message.
@InusualZ Okay i wasnt able to import the nuget Package u attached here as it is missing the reference to PdfPig.Core. So i downloaded the source Code from this Repository and built it. I got some additional error messages now that may help. I will also try to investigate whether i can find the issue.
The Error Message stays the same with this StackTrace:
bei UglyToad.PdfPig.Parser.Parts.DirectObjectFinder.Get[T](IndirectReference reference, IPdfTokenScanner scanner) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Parser\Parts\DirectObjectFinder.cs: Zeile79
bei UglyToad.PdfPig.Parser.Parts.DirectObjectFinder.Get[T](IToken token, IPdfTokenScanner scanner) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Parser\Parts\DirectObjectFinder.cs: Zeile91
bei UglyToad.PdfPig.Parser.PageFactory.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, Boolean clipPaths) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Parser\PageFactory.cs: Zeile137
bei UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, Boolean clipPaths) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Content\Pages.cs: Zeile66
bei UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\PdfDocument.cs: Zeile169
bei ConsoleApp9.Program.Main(String[] args) in C:\Users\david\source\repos\ConsoleApp9\ConsoleApp9\Program.cs: Zeile17
Ok so i had a more in depth look where the first error occurs. Insidhe the PdfTokenScanner Inside the Method MoveNext this else is entered:
The Start of readTokens looks like this:
And the end looks like this (I cannot show the full data as it may contain sensitive information. But it looks like everything else is just the FlateDecoded stream-data. No Pdf-Tags in between):
It seems a bit strange for me that he reads in the whole stream as Tokens.
Ok So the Method TryReadStream fails to read the stream at this point: The actual byte at the start of the stream is 32 --> A whitspace. If i am adding this whitspace to the if condition there is no error anymore. But the resulting PdfPig Page has no Text inside it. So i guess the stream inside the pdf is broken?
@InusualZ Okay i wasnt able to import the nuget Package u attached here as it is missing the reference to PdfPig.Core. So i downloaded the source Code from this Repository and built it. I got some additional error messages now that may help. I will also try to investigate whether i can find the issue.
The Error Message stays the same with this StackTrace:
bei UglyToad.PdfPig.Parser.Parts.DirectObjectFinder.Get[T](IndirectReference reference, IPdfTokenScanner scanner) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Parser\Parts\DirectObjectFinder.cs: Zeile79 bei UglyToad.PdfPig.Parser.Parts.DirectObjectFinder.Get[T](IToken token, IPdfTokenScanner scanner) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Parser\Parts\DirectObjectFinder.cs: Zeile91 bei UglyToad.PdfPig.Parser.PageFactory.Create(Int32 number, DictionaryToken dictionary, PageTreeMembers pageTreeMembers, Boolean clipPaths) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Parser\PageFactory.cs: Zeile137 bei UglyToad.PdfPig.Content.Pages.GetPage(Int32 pageNumber, Boolean clipPaths) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\Content\Pages.cs: Zeile66 bei UglyToad.PdfPig.PdfDocument.GetPage(Int32 pageNumber) in E:\Desktop\PdfPig-0.1.5-alpha002\src\UglyToad.PdfPig\PdfDocument.cs: Zeile169 bei ConsoleApp9.Program.Main(String[] args) in C:\Users\david\source\repos\ConsoleApp9\ConsoleApp9\Program.cs: Zeile17
In this picture, the exception that you are getting is not the same as the first one. Are you sure that you tested the same document?
Could you please try again with this one: UglyToad.PdfPig.0.1.5-alpha001.zip
If the package doesn't work again. Could you try setting a breakpoint here and print what is the type (name) of temp
.
If that doesn't work, could you try setting a breakpoint here and print what the type (name) of token
Also, from what Microsoft service is that invoice?. I tested one from a TestSubscription that I have in Azure and it seems to work. G000000000.pdf
@InusualZ The resulting error message stays the same. The Windows with the Red X are just the visualisation of the warnings which i havent had before because i was using the release build.
The nugetpackage didnt work again because of the missing dependencies.
So i did a breakpoint and he didnt went into the Method "T Get
Here the values: reference = {7 0} typeof(T).Name = "StreamToken"
My guess is that because the stream isn't read properly he cannot find the object.
Too difficult to fix without a file
Hello there,
i get an exception trying to get the first page of a specific document. The error Message is "Could not find the object number 7 0 with type StreamToken.". I Updated to the newest PreRelease of Pdf Pig (0.1.5-alpha001), but sadly the issue still exists.
I am not able to upload the document, because it contains sensitive Information. I redacted the file, but then it doesn't throw the exception anymore. If I can give you any other information that would help you to detect the issue please ask. Its a very specific case in only one Pdf File (Microsoft-Invoice).
Stack-Trace:
Used Source-Code:
Any Information would help, thanks.
Kind Regards, Manju