UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)
https://github.com/UglyToad/PdfPig/wiki
Apache License 2.0
1.59k stars 226 forks source link

Specific JPEG image is incorrectly added to PDF #710

Open cremor opened 9 months ago

cremor commented 9 months ago

When I add a specific JPEG file to a new PDF via PdfPageBuilder.AddJpeg() then the resulting PDF seems to be invalid.

Most PDF readers still show the resulting PDF correctly. But Adobe Acrobat Reader sometimes (seems dependend on zoom factor) shows an empty page and/or an error message that there is not enough data for the image. Wondershare PDFelement is the weirdest one. It does show the page, but the image has a red background which shouldn't be there.

This only affect one single JPEG that I know of. I can't share it publicly because it contains private data, but I can send it to you via email (if you don't share it).

I've tested PdfPig 0.1.8 and 0.1.9-alpha-20230914-d59d2

cremor commented 6 months ago

@EliotJones Did you get my email with the image file?

EliotJones commented 6 months ago

Hi @cremor I did thanks, unfortunately I'm deep in burnout so I don't have much time for the project. But your message has reminded me to take a look next time I do.

cremor commented 6 months ago

@EliotJones I can confirm that this is fixed in 0.1.9-alpha-20240116-4e63e, thanks!

It also partly fixed another problematic JPEG image that I recently encountered. Previously that other image was completely broken. Now it is readable, but inverted (black is white and white is black).

Could you please have a look at that image too? I'll also send it to you via email.

EliotJones commented 5 months ago

I've looked at the second file you sent over and I'm stumped. The file has 4 as its number of components value. I can't find anywhere online what that actually represents, the only values in the spec are 1 and 3. Not sure I'll be able to fix this one

cremor commented 5 months ago

Isn't that CMYK according to the comment you added with that last fix commit? https://github.com/UglyToad/PdfPig/commit/90f7e4bda2b2d5a677933aa55d42ed0cbe67f3e2#diff-427d7713130e8ebdced2a48e93367b39f1118fcd73e544a4b317fc83eabc01af

https://compress-or-die.com/analyze also shows "Colorspace CMYK" for the image.