empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

PdfReaderException: Unexpected character '0x007d' in PDF stream. #82

Closed icnocop closed 1 year ago

icnocop commented 5 years ago

Hi.

Thank you for PDFSharp.

I get an exception when trying to open a specific PDF file for reading:

using (PdfDocument pdfDocument = PdfReader.Open("test.pdf"))
{
}

Expected Behavior

I expected to be able to open the PDF for reading.

Actual Behavior

An exception occurs when trying to open the PDF for reading.

Steps to Reproduce the Behavior

  1. Download and extract PDFSharpTest.zip to a directory of your choice.
  2. Open PDFSharpTest.sln in Visual Studio 2017.
  3. Build the solution in the Debug | Any CPU configuration.
  4. Run the TestMethod1 unit test.
  5. Notice the exception:
    PdfSharp.Pdf.IO.PdfReaderException: Unexpected character '0x007d' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.

    Stack Trace:

    at PdfSharp.Internal.ParserDiagnostics.HandleUnexpectedCharacter(Char ch)
    at PdfSharp.Pdf.IO.Lexer.ScanNextToken()
    at PdfSharp.Pdf.IO.Parser.ParseObject(Symbol stop)
    at PdfSharp.Pdf.IO.Parser.ReadDictionary(PdfDictionary dict, Boolean includeReferences)
    at PdfSharp.Pdf.IO.Parser.ReadObject(PdfObject pdfObject, PdfObjectID objectID, Boolean includeReferences, Boolean fromObjecStream)
    at PdfSharp.Pdf.IO.PdfReader.Open(Stream stream, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider passwordProvider)
    at PdfSharp.Pdf.IO.PdfReader.Open(String path, String password, PdfDocumentOpenMode openmode, PdfPasswordProvider provider)
    at PdfSharp.Pdf.IO.PdfReader.Open(String path)
    at PDFSharpTest.UnitTest1.TestMethod1() in C:\Users\rami.abughazaleh\Documents\Visual Studio 2017\Projects\PDFSharpTest\UnitTest1.cs:line 15

Any ideas?

The sample PDF is included in the attached zip.

I'm running on Windows 10 64-bit Version 1809 (OS Build 17763.292) and using Visual Studio 2017 Enterprise Version 15.9.8.

PDFsharp v1.51.5185-beta. I also tried it with PDFsharp v1.50.5147.

I can open the PDF using Adobe Acrobat Reader DC v2019.010.20098 without issues.

Thank you!

icnocop commented 5 years ago

I've attached a unit test that does the following:

  1. open an existing PDF file using File.Open which returns a FileStream.
  2. copy the FileStream to a MemoryStream.
  3. call PdfReader.Open on the MemoryStream which returns a PdfDocument.
  4. call PdfDocument.Save on a new FileStream that was created by creating a new file on disk.
  5. try to open the new file

Step 5 throws PdfReaderException: Unexpected character '0x007d' in PDF stream.

PDFSharpMemoryStreamTest.zip

        [TestMethod]
        public void TestMethod1()
        {
            string inputFilePath = @"test.pdf";

            // open the input PDF file
            using (FileStream fileStream = File.Open(inputFilePath, FileMode.Open))
            {
                // copy the file stream to the memory stream
                using (MemoryStream memoryStream = new MemoryStream())
                {
                    fileStream.CopyTo(memoryStream);

                    // move to the first position in the memory stream
                    memoryStream.Position = 0;

                    // open the memory stream as a PDF
                    using (PdfDocument pdfDocument = PdfReader.Open(memoryStream))
                    {
                        // create the output PDF file
                        using (var outputFileStream = new FileStream("output.pdf", FileMode.Create))
                        {
                            // save the PDF to the output PDF file stream
                            pdfDocument.Save(outputFileStream);
                        }
                    }
                }
            }

            // try to open the output PDF file
            // Unexpected character '0x007b' in PDF stream. The file may be corrupted.If you think this is a bug in PDFsharp, please send us your PDF file.
            using (PdfDocument pdfDocument = PdfReader.Open("output.pdf"))
            {
            }
        }

The unit test attached to the first post basically does Step 5. The new PDF file created in Step 4 is similar to what the unit test attached to the first post tries to open.

Any ideas?

Thank you!

sintmetmijter commented 3 years ago

I have the same issues when merging 2 documents. One document reading then fails with the same error.

I solved it by letting these PDF be rewritten by Foxit PhantomPDF to PDF/E documents.

ThomasHoevel commented 1 year ago

I opened the file with Adobe Reader and tried File / Save as, but it reported an error. So the file is corrupted.