empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

InvalidOperationException when attempting to save a valid PdfDocument #74

Closed dkoehler69 closed 1 year ago

dkoehler69 commented 6 years ago

I discovered a bug in the PdfDocument.Save functions.

Expected Behavior

Saving without exception.

Actual Behaviour

An InvalidOperationException("Cannot save a PDF document with no pages.") is triggered on a freshly opened PdfDocument, even if it contains pages.

This behaviour is only triggered in the compiled programme or during uninterrupted code runs in the debugger (I am using VS 2017). When using step by step debugging (F10), the exception is not triggered.

Steps to Reproduce the Behavior

Calling the following Test() function with a valid PDF byte array triggers the InvalidOperationException when reaching the pdfDoc.Save(pdfFilePath) instruction. All the other checks before are passed successfully.

  public static void Test(this byte[] pdf, string pdfFilePath)
  {
      if (pdf == null) { throw new ArgumentNullException(nameof(pdf)); }

      PdfDocument pdfDoc;
      try
      {
        pdfDoc = PdfReader.Open(pdf.ToPdfMemoryStream(), PdfDocumentOpenMode.Import);
      }
      catch (FormatException)
      {
        MessageBox.Show("Error: Invalid or empty PDF.");
        return;
      }

      pdfDoc.Save(pdfFilePath);
  }

   public static MemoryStream ToPdfMemoryStream(this byte[] pdf)
    {
      if (pdf == null) { throw new ArgumentNullException(nameof(pdf)); }

      PdfDocument outputDocument = new PdfDocument();
      using (MemoryStream inputStream = new MemoryStream(pdf))
      {
        try
        {
          PdfDocument inputDocument = PdfReader.Open(inputStream, PdfDocumentOpenMode.Import);
          foreach (PdfPage page in inputDocument.Pages)
          {
            outputDocument.AddPage(page);
          }
        }
        catch (InvalidOperationException)
        {
          throw new FormatException("Kein gültiges PDF-Dokument.");
        }
      }
      if (outputDocument.PageCount == 0) { throw new FormatException("PDF is empty"); }

      MemoryStream outputStream = new MemoryStream();
      outputDocument.Save(outputStream);
      return outputStream;
    }

Workaround

Insert the following line in front of pdfDoc.Save(pdfFilePath);: if (pdfDoc.PageCount == 0) { throw new FormatException("PDF is empty"); }

Possible Fix

The exception is thrown when in the void DoSave(PdfWriter writer) function of the PdfDocument class the following condition is met: if (_pages == null || _pages.Count == 0) The bug is caused by the _pages variable not being initialised. This also explains the workaround: Calling PdfDocument.PageCount triggers the initialisation of _pages.

A possible fix of the bug is to replace _pages with Pages in the incriminated line, which triggers the initialisation. Here is a compact version of the fix that avoids to attempt the initialisation twice: if ((Pages?.Count ?? 0) == 0)

ThomasHoevel commented 1 year ago

Files opened without the Modify flag cannot be saved.

dkoehler69 commented 1 year ago

The workaround demonstrates that the PDF can be saved just by evaluating PdfDocument.PageCount before calling PdfDocument.Save(). Calling PdfDocument.PageCount initialises _pages. Then the if (_pages == null || _pages.Count == 0) check is passed. This seems to indicate a flaw in the whole process to me.

(A caveat: My bug report was from 2018. I did not check, whether it is obselete meanwhile.)

ThomasHoevel commented 1 year ago

The bug report is obsolete with PDFsharp 6.0.0-preview-3 coming next month or so.