empira / PDFsharp

PDFsharp and MigraDoc Foundation for .NET 6 and .NET Framework
https://docs.pdfsharp.net/
Other
492 stars 114 forks source link

Corrupted File Error #140

Closed rhit-olingejj closed 2 months ago

rhit-olingejj commented 2 months ago

Reporting an Issue Here

Expected Behavior

After the PfdReader.Open is called on the pdf file it is expected to open for reading.

Actual Behavior

It throws an error in the program below. Would be nice to have some type of repair functionality or ability to open this PDF as it opens in PDF viewer programs just fine.

Steps to Reproduce the Behavior

  1. Run attached solution and it will error out. Zip attached. PDF files also attached that it errors out on. PDFsharp.IssueSubmissionTemplate.zip TestUser%20%202%202023W2.pdf TestUser202023W2.pdf
ThomasHoevel commented 2 months ago

Tried to open the PDF with Adobe Reader and got this:

Corrupt

If the files does not open with Adobe Reader, then I assume there is something wrong with the file.

Looks like some sort of archive file containing several files. Should open fine with PDFsharp if your code removes the extra headers and trailers before sending the contents to PDFsharp.

StLange commented 2 months ago

A valid PDF file starts with %PDF-x.y, e.g. %PDF-1.5. Your file starts with PK ô0òXœb4kV? V?  FormW2_TestUser2_782477.pdf%PDF-1.5 (open it in Notepad++) The file ends in line 1929 with %%EOF. In the next line a new PDF file begins PK ô0òX”§ë¾@? @?  FormW2_estUser2_782478.pdf%PDF-1.5 Your file seems to be some kind of concatenation of 7 PDF files. I never saw this before. It is interesting that some browsers can open it but not Adobe Reader. I tried to extract the first two PDF file parts with Notepad++, but Adobe Reader still cannot open the single files.

What tool produces this file?

rhit-olingejj commented 2 months ago

This is produced using Tax1099 through their GeneratePDF API. I will see if some pre-processing alleviates the issue.

jhouzvicka commented 2 months ago

Actually, both of the attached PDF files are ZIP files.