Open ilCosmico opened 2 years ago
Hello @ilCosmico, The inverted image in the PDF is of type XObject has 'Separation' colorspace currently not supported. Associated RawBytes [you have extracted] is a one byte per pixel JPG. Separation color based on black so final is inverted. Have working (but not pretty) code supporting this 'separation'. The not pretty part is support for reading JPG (on .net standard). The extracted PNG image is attached.
@ilCosmico and @EliotJones better looking code now complete. Testing underway however still will be a while before able to check in.
Example of images using separation colorspace is rare. After accumulating 30,000 PDFs from public links found 202 PDFs with at lease one example image. From these 202 PDFs create a single PDF which copies just the example pages with (at least one) separation image (some pages have serveral).
The single (563 page 278MB) PDF named PdfWithSeveralSeparationImages20230308.PDF
can be found in the ZIP at: https://www.dropbox.com/s/0ec0y5hrtmk78tt/PdfWithSeveralSeparationImagesPageSources20230308.zip?dl=1
In the ZIP is
PdfWithSeveralSeparationImages20230308.PDF
DescriptionOfPdfWithSeveralSeparationImages20230308.txt A comma separated text file describing each page of the PDF Columns: 1 PageNumber within the PdfWithSeveralSeparationImages20230308.PDF 2 ImageNumber the ordinal of the image on the page 3 BitsPerComponent 4 Width 5 Height 6 RawByteSize 7 AltColorSpaceName 8 TintFunctionNumber
PdfWithSeveralSeparationImagesPageSources20230308.txt This describes each page (PdfWithSeveralSeparationImages20230308.PDF) and where the page was copied from. Comma separated text file. Columns:
use this together with the following and last file in the ZIP.
There are some 5640 example images using the separate colorspace which cover all the "Tint Functions" and many alternate colorspaces. Hope it helps someone.
Comments on this activity spilled over in to #532. Copied here to put in to context.
ColorSeparation colorspace itself makes use of a “Tint function” which can be implemented in 4 modes: 0 Sampled function 2 Exponential interpolation function 3 Stitching function 4 PostScript calculator function
These are also well underway however testing these again will be significant.
Have found 11 (public) “in the wild” example PDFs using separation colorspace [it's rare].
DCT (Discrete Cosine Transform) based on ITU-T81 4.5 has four distinct modes of operation with various coding processes:
Adobe Technical Note TN.5116 details additional decode handling (from inside a PDF) including support for App14 "Adobe" Application Segment hint for colorspace transform support. The default is to use the YCC-to-RGB [color]transform. Byte 11 signals color translations of: // 0 = CMYK // 1== YCCK
8 bit only (16bit or others require down/up sampling to 8 bit; yet to be implemented).
After all post image processing implemented final step will be translating (Device Independent Bitmap) to PNG for final export from library.
@BobLd assigned writing the PDF Functions ("Tint functions") to himself silently and completed implentation and testing in secret. The separtioncolor space has been updated with the Tint function. Still not sure why. Separation use is so rare.
This issue (#484) raised originally was a little broader which was "Wrong color reading a picture" and was more about image export than just the Separation colorspace so there is still work to do in PngFromPdfImageFactory and ColorSpaceDetailsByteConverter to convert to RGB for PNG render of import export if we go back to the original issue raised .
@BobLd are you going to do that part?
The image export from the PDF supplied (test.pdf) (to be a succes) also needs DecodeDCT (which later was raised separately by someone else as #532 so seems like the place to put process there).
Perhaps someone would be kind enough to rename this issue back please (now Seperation is done).
@BobLd
thank you for renaming
rename doesn't match issue raised.
original issue raised was about image export.
although not mentioned beleive author's intention was asking about image.TryGetPng
three things are required:
For example: src\UglyToad.PdfPig\Images\Png\PngFromPdfImageFactory.cs Line 17
\src\UglyToad.PdfPig\Images\ColorSpaceDetailsByteConverter.cs
@BobLd I'm check to see if you going to complete changes to PngFromPdfImageFactory and ColorSpaceDetailsByteConverter that part?
@fnatzke
@BobLd assigned writing the PDF Functions ("Tint functions") to himself silently and completed implentation and testing in secret. The separtioncolor space has been updated with the Tint function. Still not sure why. Separation use is so rare.
Not sure I understand your comment above and how I should take it, there is nothing secret. Regarding the tint function in the Separation color space, this is the definition of a Separation color space. Now we can actually use the function.
@BobLd I'm check to see if you going to complete changes to PngFromPdfImageFactory and ColorSpaceDetailsByteConverter that part?
As mentioned earlier, I've created a discussion here https://github.com/UglyToad/PdfPig/discussions/574 and a project here https://github.com/UglyToad/PdfPig/projects/5 where we can coordinate contributions, as per your request in #532 The short answer to the question is Yes
Please take it this way. I wrote I was working on this issue. You did not. I spend many days working on this (on and off over 4 months). It was a very large amount of time wasted. For nothing. Perhaps this was code you had lying around and you just had to makes some test cases and publish. Perhaps you had to write it from scratch. Either way from August last year silent (then spent time in secret creating a PR). The fact that it works irrelevant. It's about being decent.
@ilCosmico and @EliotJones better looking code now complete. Testing underway however still will be a while before able to check in.
@fnatzke any ETA about the release of the build containing the fix? Thanks in advance!
Hello @fnatzke, it's been a year since we last touched base about the testing phase. Any updates on that front? Time flies! 😅
Any news on this?
@ilCosmico I'm planning to add DCT support in a separate NuGet package shortly via JpegLibrary (hopefully in the next 2 weeks)
I already have a proof of concept here https://github.com/BobLd/PdfPig/tree/develop-caly
@BobLd nice to hear that! Will it be a new NuGet package referenced by PdfPig?
@ilCosmico the opposite, the new package references PdfPig.
I've release a initial version of the code here: https://github.com/BobLd/UglyToad.PdfPig.Filters.Dct.JpegLibrary. It's also available as a NuGet package (pre-release).
Have a look at the repo's READMe to understand how to use it. I'll simplify the use soon.
@BobLd thanks for the clarification. I tried it, and it works quite well. Do you have an ETA for the official release?
@ilCosmico thanks for the feedback. I first need to release the official release for PdfPig, and the filter NuGet packet will follow up.
This will happen shortly
The attached pdf contains a picture that is read with inverted colors as shown here below
I started from your sample code and I just add the code for saving bmp file.
test.pdf