empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

PDFSharp 1.50: Invalid predictor in array #117

Closed misaelitox closed 1 year ago

misaelitox commented 4 years ago

If you think there is a bug in PDFsharp then please use the IssueSubmissionTemplate to make the issue replicable.
http://www.pdfsharp.net/wiki/IssueSubmissions.ashx

Thanks.

Resources

The official project web site:
http://pdfsharp.net/

The official peer-to-peer support forum:
http://forum.pdfsharp.net/

Reporting an Issue Here

When trying to merge two PDFs this should be successfully performed

Actual Behavior

This throws an exception due "Invalid predictor in array".

Steps to Reproduce the Behavior

I attached file "3595_SS 092 Bldg 300 skewed beam to beam connection.pdf" which seems having been produced by tool iTextSharp 4.1.6, every time I tried merging any other pdf with this one the exception occurred, I'm not sure if the tool they used to generate the PDF may cause some incompatibilities, maybe file corruption or even file signature, there is no more details on why this file is not valid.

3595_SS 092 Bldg 300 skewed beam to beam connection.pdf Screen Shot 2020-02-24 at 5 27 01 PM

misaelitox commented 4 years ago

I can attach the other files i'm using which are actually valid, but only when trying to merge files with the one attached in this issue (3595_SS 092 Bldg 300 skewed beam to beam connection.pdf) is when the exception occurred.

timk259 commented 4 years ago

This problem appears to be due to an incomplete implementation in PDFSharp for decompressing data, not because of a corrupt file. My knowledge of the PDF format is limited, but PDF files apparently support data compression that is similar to that used for .png image files. The compression process can use "predictors" to improve compression, which is then used during decompression, such as when PDFSharp reads a PDF file.

There are two problems: 1) For the predictor type specified in the dictionary that appears before the data stream, PDFSharp only seems to support the PNG predictor types (10 through 15), not types 1 and 2 as listed in the PDF specification, though I have not noticed an issue because of that. 2) Within the data stream there are row-level predictors in the range 0 to 4 (see footnote [1]); PDFSharp checks for type 2 but not the other types. In the example files that I've seen (including in a forum post [2]), the files contain valid predictor values other than 2, which leads to that "Invalid predictor in array" error.

I prepared my own fix for this issue which seems to work but I don't necessarily have sufficient files to test it thoroughly, so hopefully someone else can develop an official fix based on this information.

[1] https://en.wikipedia.org/wiki/Portable_Network_Graphics#Filtering [2] https://forum.pdfsharp.net/viewtopic.php?f=3&t=3713

dcbazscott commented 2 years ago

I experienced the same problem today with a file one of my users uploaded. If I open it in Acrobat Reader and save it, it works fine, but PDFSharp cannot read it. Hopefully this code helps diagnose the issue.

PDFsharp-IssueSubmission.zip

KlaskSkovby commented 2 years ago

This problem appears to be due to an incomplete implementation in PDFSharp for decompressing data, not because of a corrupt file. My knowledge of the PDF format is limited, but PDF files apparently support data compression that is similar to that used for .png image files. The compression process can use "predictors" to improve compression, which is then used during decompression, such as when PDFSharp reads a PDF file.

There are two problems:

  1. For the predictor type specified in the dictionary that appears before the data stream, PDFSharp only seems to support the PNG predictor types (10 through 15), not types 1 and 2 as listed in the PDF specification, though I have not noticed an issue because of that.
  2. Within the data stream there are row-level predictors in the range 0 to 4 (see footnote [1]); PDFSharp checks for type 2 but not the other types. In the example files that I've seen (including in a forum post [2]), the files contain valid predictor values other than 2, which leads to that "Invalid predictor in array" error.

I prepared my own fix for this issue which seems to work but I don't necessarily have sufficient files to test it thoroughly, so hopefully someone else can develop an official fix based on this information.

[1] https://en.wikipedia.org/wiki/Portable_Network_Graphics#Filtering [2] https://forum.pdfsharp.net/viewtopic.php?f=3&t=3713

Can you post your fix?

ThomasHoevel commented 1 year ago

The bug should no longer occur with PDFsharp 6.0.0-preview-2 or later.