empira / PDFsharp

PDFsharp and MigraDoc Foundation for .NET 6 and .NET Framework
https://docs.pdfsharp.net/
Other
467 stars 109 forks source link

Opening PDF from Google spreadsheet fails with a "Name required" exception #143

Open rainerbossert opened 1 month ago

rainerbossert commented 1 month ago

Problem Opening a PDF document that has been downloaded from Google Spreadsheets as PDF (with settings as is) fails when opening with PdfReader.

I attached the issue template solution with two NUnit-Tests, one of them failing, the other one running through successfully with a locally "Printed to PDF" document.

Issue.zip

Stack at PdfSharp.Internal.ParserDiagnostics.ThrowParserException(String message) at PdfSharp.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences) at PdfSharp.Pdf.IO.Parser.ReadXRefTableAndTrailer(PdfCrossReferenceTable xrefTable) at PdfSharp.Pdf.IO.Parser.ReadTrailer() at PdfSharp.Pdf.IO.PdfReader.OpenFromStream(Stream stream, String password, PdfDocumentOpenMode openMode, PdfPasswordProvider passwordProvider, PdfReaderOptions options) at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openMode, PdfPasswordProvider passwordProvider, PdfReaderOptions options) at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, PdfDocumentOpenMode openMode, PdfReaderOptions options) at

StLange commented 1 month ago

This is a bug in PDFsharp I created recently. I optimized the parser earlier this year and overlooked the fact that an object reference (two numbers followed by an R) are separated by white spaces, but I coded for spaces only.

Most PDF producers used spaces like 123 0 R , but it seems that Google uses LF instead. Very rare, but valid.

123
0
R

That crashes the parser.

We will fix this immediately and provide the fix in the next preview release and the source code in the wiki.

rainerbossert commented 1 month ago

Thanks very much - I appreciate your work.

ThomasHoevel commented 1 month ago

Those who want to fix the issue with PDFsharp 6.1.1 will find the modified code here: https://github.com/empira/PDFsharp/wiki/PDFsharp-6.1.1-does-not-parse-object-references-with-line-feeds

Will be fixed with PDFsharp 6.2.0 Preview 1 coming soon.