empira / PDFsharp-1.5

A .NET library for processing PDF
MIT License
1.28k stars 588 forks source link

unable to open document using PDFsharp #99

Closed arunaktsp closed 1 year ago

arunaktsp commented 5 years ago

serachbalePDf.pdf

unable to open the attached document. It is throwing below error,

Unexpected token 'n' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.

We tried to debug the issue using PDFSharp code and below are our analysis,

There were two Xref table and trailers in the PDF file. The byte position mentioned in the Last StartXref wan not pointing to the beginning of Xrefy table. it was incorrect and we fixed by removing the invalid Xref entry.

Then PDF sharp able to read correct Xref table as the position mentioned in another StartXref was correct

The first entry in Xref table refers to total number of entries in Xref table. Each entry will be looped and the entry point will be validated.

Entries in each Xref refers to indirect object start position. Till 8th entry the start position was correct. When evaluating 9th entry the PDFSharp looks for 9 th indirect object. Indirect object can be positioned anywhere in the PDF stream.

There were no entry for object 9 - 17 in the PDF stream and we received invalid entry in XRef table error as there were no indirect object found. There were duplicate Indirect Object in the file.

We fixed by skipping this validation PDF sharp code, now we are facing error in reading trailer record stream

Attached the PDF file which has corrupt Xref table issue.

ds-wb commented 1 year ago

_6FA0XEI9P.PDF PDFsharp version 1.50.5147

With the attached file, I experience the following error that looks very similar : Unexpected token 'e' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.

As far as I understand, it is due to the top-right picture that contains a png with alpha inside. At least if I convert it with a non-alpha version, no more error. Any way to get ride of the error in PDFsharp ? Or at least ignore it ? Our software is adding watermark on existing PDF and I can not ask users to take care of the non-alpha stuffs... our software is expected to work with any PDF source as you can guess...

ThomasHoevel commented 1 year ago

our software is expected to work with any PDF source as you can guess...

Sure. I opened this PDF with Adobe Reader and then selected File / Save as. Adobe Reader reported an error. I assume the file is corrupt and PDFsharp does not attempt to open any corrupt file.

ThomasHoevel commented 1 year ago

unable to open the attached document. It is throwing below error,

Unexpected token 'n' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file.

I also get this error. I do not get an error when I open the file with Adobe Reader, then use File / Save as and open the resulting file with PDFsharp. So I assume the file is corrupt and PDFsharp will not be modified to handle this corruption in the near future.