Open bitsgalore opened 1 month ago
Pdfimages:
pdfimages -list BKT-ecur002glas01_01.pdf
Output (edited down to one image at page 10):
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
10 9 image 1556 2400 icc 3 8 jpeg no 79 0 150 150 260K 2.4%
Value of color
indicates ICC profile.
So let's extract the Image XObject that represents this image (using object
value):
mutool show BKT-ecur002glas01_01.pdf 79 > 79.dat
Result:
79 0 obj
<<
/Width 1556
/BitsPerComponent 8
/Name /Im0
/Height 2400
/Subtype /Image
/Filter [ /DCTDecode ]
/Length 265866
/ColorSpace 77 0 R
/Type /XObject
>>
stream
...
endstream
endobj
Notice that ColorSpace is defined through a referenced object (77). So let's extract this object as well:
mutool show BKT-ecur002glas01_01.pdf 77 > 77.dat
Result:
77 0 obj
[ /ICCBased 78 0 R ]
endobj
As per 8.6.5.5 (ICCBased Colour Spaces) of ISO 32000-1, this indicates an ICCBased colour space, where the stream (defined by object 78) contains the ICC profile. So let's extract this:
mutool show BKT-ecur002glas01_01.pdf 78 > 78.dat
Result:
78 0 obj
<<
/Filter /ASCII85Decode
/N 3
/Alternate /DeviceRGB
/Length 513
>>
stream
...
endstream
endobj
We can then extract the ICC profile using:
mutool show -b -o 78-stream.dat BKT-ecur002glas01_01.pdf 78
Then use ExifTool to inspect its properties:
exiftool -X 78-stream.dat > 78-stream.xml
Result:
<?xml version='1.0' encoding='UTF-8'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='78-stream.dat'
xmlns:et='http://ns.exiftool.org/1.0/' et:toolkit='Image::ExifTool 12.60'
xmlns:ExifTool='http://ns.exiftool.org/ExifTool/1.0/'
xmlns:System='http://ns.exiftool.org/File/System/1.0/'
xmlns:File='http://ns.exiftool.org/File/1.0/'
xmlns:ICC-header='http://ns.exiftool.org/ICC_Profile/ICC-header/1.0/'
xmlns:ICC_Profile='http://ns.exiftool.org/ICC_Profile/ICC_Profile/1.0/'>
<ExifTool:ExifToolVersion>12.60</ExifTool:ExifToolVersion>
<System:FileName>78-stream.dat</System:FileName>
<System:Directory>.</System:Directory>
<System:FileSize>560 bytes</System:FileSize>
<System:FileModifyDate>2024:09:26 14:10:21+00:00</System:FileModifyDate>
<System:FileAccessDate>2024:09:26 14:10:29+00:00</System:FileAccessDate>
<System:FileInodeChangeDate>2024:09:26 14:10:21+00:00</System:FileInodeChangeDate>
<System:FilePermissions>-rw-rw-r--</System:FilePermissions>
<File:FileType>ICC</File:FileType>
<File:FileTypeExtension>icc</File:FileTypeExtension>
<File:MIMEType>application/vnd.iccprofile</File:MIMEType>
<ICC-header:ProfileCMMType>Little CMS</ICC-header:ProfileCMMType>
<ICC-header:ProfileVersion>2.1.0</ICC-header:ProfileVersion>
<ICC-header:ProfileClass>Display Device Profile</ICC-header:ProfileClass>
<ICC-header:ColorSpaceData>RGB </ICC-header:ColorSpaceData>
<ICC-header:ProfileConnectionSpace>XYZ </ICC-header:ProfileConnectionSpace>
<ICC-header:ProfileDateTime>2000:08:11 19:51:59</ICC-header:ProfileDateTime>
<ICC-header:ProfileFileSignature>acsp</ICC-header:ProfileFileSignature>
<ICC-header:PrimaryPlatform>Microsoft Corporation</ICC-header:PrimaryPlatform>
<ICC-header:CMMFlags>Not Embedded, Independent</ICC-header:CMMFlags>
<ICC-header:DeviceManufacturer>none</ICC-header:DeviceManufacturer>
<ICC-header:DeviceModel></ICC-header:DeviceModel>
<ICC-header:DeviceAttributes>Reflective, Glossy, Positive, Color</ICC-header:DeviceAttributes>
<ICC-header:RenderingIntent>Perceptual</ICC-header:RenderingIntent>
<ICC-header:ConnectionSpaceIlluminant>0.9642 1 0.82491</ICC-header:ConnectionSpaceIlluminant>
<ICC-header:ProfileCreator>Little CMS</ICC-header:ProfileCreator>
<ICC-header:ProfileID>0</ICC-header:ProfileID>
<ICC_Profile:ProfileCopyright>Copyright 2000 Adobe Systems Incorporated</ICC_Profile:ProfileCopyright>
<ICC_Profile:ProfileDescription>Adobe RGB (1998)</ICC_Profile:ProfileDescription>
<ICC_Profile:MediaWhitePoint>0.95045 1 1.08905</ICC_Profile:MediaWhitePoint>
<ICC_Profile:MediaBlackPoint>0 0 0</ICC_Profile:MediaBlackPoint>
<ICC_Profile:RedTRC>(Binary data 14 bytes, use -b option to extract)</ICC_Profile:RedTRC>
<ICC_Profile:GreenTRC>(Binary data 14 bytes, use -b option to extract)</ICC_Profile:GreenTRC>
<ICC_Profile:BlueTRC>(Binary data 14 bytes, use -b option to extract)</ICC_Profile:BlueTRC>
<ICC_Profile:RedMatrixColumn>0.60974 0.31111 0.01947</ICC_Profile:RedMatrixColumn>
<ICC_Profile:GreenMatrixColumn>0.20528 0.62567 0.06087</ICC_Profile:GreenMatrixColumn>
<ICC_Profile:BlueMatrixColumn>0.14919 0.06322 0.74457</ICC_Profile:BlueMatrixColumn>
</rdf:Description>
</rdf:RDF>
Now let's have a look at the actual JPEG file that is embedded as part of object 79. First we extract the raw datastream from the Image XObject:
mutool show -be -o 79-stream.dat BKT-ecur002glas01_01.pdf 79
The resulting file 79-stream.dat
is actually a JPEG image, so let's analyse that with ExifTool:
exiftool -X 79-stream.dat > 79-stream.xml
Result:
<?xml version='1.0' encoding='UTF-8'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='79-stream.dat'
xmlns:et='http://ns.exiftool.org/1.0/' et:toolkit='Image::ExifTool 12.60'
xmlns:ExifTool='http://ns.exiftool.org/ExifTool/1.0/'
xmlns:System='http://ns.exiftool.org/File/System/1.0/'
xmlns:File='http://ns.exiftool.org/File/1.0/'
xmlns:JFIF='http://ns.exiftool.org/JFIF/JFIF/1.0/'
xmlns:ICC-header='http://ns.exiftool.org/ICC_Profile/ICC-header/1.0/'
xmlns:ICC_Profile='http://ns.exiftool.org/ICC_Profile/ICC_Profile/1.0/'
xmlns:Composite='http://ns.exiftool.org/Composite/1.0/'>
<ExifTool:ExifToolVersion>12.60</ExifTool:ExifToolVersion>
<System:FileName>79-stream.dat</System:FileName>
<System:Directory>.</System:Directory>
<System:FileSize>266 kB</System:FileSize>
<System:FileModifyDate>2024:09:26 14:02:04+00:00</System:FileModifyDate>
<System:FileAccessDate>2024:09:26 14:02:04+00:00</System:FileAccessDate>
<System:FileInodeChangeDate>2024:09:26 14:02:04+00:00</System:FileInodeChangeDate>
<System:FilePermissions>-rw-rw-r--</System:FilePermissions>
<File:FileType>JPEG</File:FileType>
<File:FileTypeExtension>jpg</File:FileTypeExtension>
<File:MIMEType>image/jpeg</File:MIMEType>
<File:ImageWidth>1556</File:ImageWidth>
<File:ImageHeight>2400</File:ImageHeight>
<File:EncodingProcess>Baseline DCT, Huffman coding</File:EncodingProcess>
<File:BitsPerSample>8</File:BitsPerSample>
<File:ColorComponents>3</File:ColorComponents>
<File:YCbCrSubSampling>YCbCr4:4:4 (1 1)</File:YCbCrSubSampling>
<JFIF:JFIFVersion>1.01</JFIF:JFIFVersion>
<JFIF:ResolutionUnit>None</JFIF:ResolutionUnit>
<JFIF:XResolution>150</JFIF:XResolution>
<JFIF:YResolution>150</JFIF:YResolution>
<ICC-header:ProfileCMMType>Little CMS</ICC-header:ProfileCMMType>
<ICC-header:ProfileVersion>2.1.0</ICC-header:ProfileVersion>
<ICC-header:ProfileClass>Display Device Profile</ICC-header:ProfileClass>
<ICC-header:ColorSpaceData>RGB </ICC-header:ColorSpaceData>
<ICC-header:ProfileConnectionSpace>XYZ </ICC-header:ProfileConnectionSpace>
<ICC-header:ProfileDateTime>2000:08:11 19:51:59</ICC-header:ProfileDateTime>
<ICC-header:ProfileFileSignature>acsp</ICC-header:ProfileFileSignature>
<ICC-header:PrimaryPlatform>Microsoft Corporation</ICC-header:PrimaryPlatform>
<ICC-header:CMMFlags>Not Embedded, Independent</ICC-header:CMMFlags>
<ICC-header:DeviceManufacturer>none</ICC-header:DeviceManufacturer>
<ICC-header:DeviceModel></ICC-header:DeviceModel>
<ICC-header:DeviceAttributes>Reflective, Glossy, Positive, Color</ICC-header:DeviceAttributes>
<ICC-header:RenderingIntent>Perceptual</ICC-header:RenderingIntent>
<ICC-header:ConnectionSpaceIlluminant>0.9642 1 0.82491</ICC-header:ConnectionSpaceIlluminant>
<ICC-header:ProfileCreator>Little CMS</ICC-header:ProfileCreator>
<ICC-header:ProfileID>0</ICC-header:ProfileID>
<ICC_Profile:ProfileCopyright>Copyright 2000 Adobe Systems Incorporated</ICC_Profile:ProfileCopyright>
<ICC_Profile:ProfileDescription>Adobe RGB (1998)</ICC_Profile:ProfileDescription>
<ICC_Profile:MediaWhitePoint>0.95045 1 1.08905</ICC_Profile:MediaWhitePoint>
<ICC_Profile:MediaBlackPoint>0 0 0</ICC_Profile:MediaBlackPoint>
<ICC_Profile:RedTRC>(Binary data 14 bytes, use -b option to extract)</ICC_Profile:RedTRC>
<ICC_Profile:GreenTRC>(Binary data 14 bytes, use -b option to extract)</ICC_Profile:GreenTRC>
<ICC_Profile:BlueTRC>(Binary data 14 bytes, use -b option to extract)</ICC_Profile:BlueTRC>
<ICC_Profile:RedMatrixColumn>0.60974 0.31111 0.01947</ICC_Profile:RedMatrixColumn>
<ICC_Profile:GreenMatrixColumn>0.20528 0.62567 0.06087</ICC_Profile:GreenMatrixColumn>
<ICC_Profile:BlueMatrixColumn>0.14919 0.06322 0.74457</ICC_Profile:BlueMatrixColumn>
<Composite:ImageSize>1556x2400</Composite:ImageSize>
<Composite:Megapixels>3.7</Composite:Megapixels>
</rdf:Description>
</rdf:RDF>
This shows that the JPEG data contains an embedded ICC profile.
So summarising the ICC profile is defined twice here: once for the Image XObject that represents the image at the PDF level, and once at the level of the embedded JPEG. The above ExifTool output shows that the ICC profile is identical in both cases.
Pdfimages:
pdfimages -list kort004mult01_01_50.pdf
Output (edited down to one image at page 5):
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
5 4 image 1961 2884 rgb 3 8 jpeg no 12 0 301 301 108K 0.7%
Value of colr
indicates no ICC profile at the PDF Image XObject level. So let's have a look at the object using:
mutool show kort004mult01_01_50.pdf 12 > 12.dat
Result:
12 0 obj
<<
/BitsPerComponent 8
/ColorSpace /DeviceRGB
/Filter [ /DCTDecode ]
/Height 2884
/Length 110562
/Subtype /Image
/Type /XObject
/Width 1961
>>
stream
...
endstream
endobj
So color space is defined as "DeviceRGB". Extract object stream data again:
mutool show -be -o 12-stream.dat kort004mult01_01_50.pdf 12
Analyse with ExifTool:
exiftool -X 12-stream.dat > 12-stream.xml
Result:
<?xml version='1.0' encoding='UTF-8'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='12-stream.dat'
xmlns:et='http://ns.exiftool.org/1.0/' et:toolkit='Image::ExifTool 12.60'
xmlns:ExifTool='http://ns.exiftool.org/ExifTool/1.0/'
xmlns:System='http://ns.exiftool.org/File/System/1.0/'
xmlns:File='http://ns.exiftool.org/File/1.0/'
xmlns:JFIF='http://ns.exiftool.org/JFIF/JFIF/1.0/'
xmlns:ICC-header='http://ns.exiftool.org/ICC_Profile/ICC-header/1.0/'
xmlns:ICC_Profile='http://ns.exiftool.org/ICC_Profile/ICC_Profile/1.0/'
xmlns:Composite='http://ns.exiftool.org/Composite/1.0/'>
<ExifTool:ExifToolVersion>12.60</ExifTool:ExifToolVersion>
<System:FileName>12-stream.dat</System:FileName>
<System:Directory>.</System:Directory>
<System:FileSize>111 kB</System:FileSize>
<System:FileModifyDate>2024:09:26 15:32:52+00:00</System:FileModifyDate>
<System:FileAccessDate>2024:09:26 15:32:55+00:00</System:FileAccessDate>
<System:FileInodeChangeDate>2024:09:26 15:32:52+00:00</System:FileInodeChangeDate>
<System:FilePermissions>-rw-rw-r--</System:FilePermissions>
<File:FileType>JPEG</File:FileType>
<File:FileTypeExtension>jpg</File:FileTypeExtension>
<File:MIMEType>image/jpeg</File:MIMEType>
<File:ImageWidth>1961</File:ImageWidth>
<File:ImageHeight>2884</File:ImageHeight>
<File:EncodingProcess>Baseline DCT, Huffman coding</File:EncodingProcess>
<File:BitsPerSample>8</File:BitsPerSample>
<File:ColorComponents>3</File:ColorComponents>
<File:YCbCrSubSampling>YCbCr4:2:0 (2 2)</File:YCbCrSubSampling>
<JFIF:JFIFVersion>1.02</JFIF:JFIFVersion>
<JFIF:ResolutionUnit>inches</JFIF:ResolutionUnit>
<JFIF:XResolution>300</JFIF:XResolution>
<JFIF:YResolution>300</JFIF:YResolution>
<ICC-header:ProfileCMMType>Adobe Systems Inc.</ICC-header:ProfileCMMType>
<ICC-header:ProfileVersion>2.4.0</ICC-header:ProfileVersion>
<ICC-header:ProfileClass>Display Device Profile</ICC-header:ProfileClass>
<ICC-header:ColorSpaceData>RGB </ICC-header:ColorSpaceData>
<ICC-header:ProfileConnectionSpace>XYZ </ICC-header:ProfileConnectionSpace>
<ICC-header:ProfileDateTime>2007:03:02 10:07:41</ICC-header:ProfileDateTime>
<ICC-header:ProfileFileSignature>acsp</ICC-header:ProfileFileSignature>
<ICC-header:PrimaryPlatform>Unknown ()</ICC-header:PrimaryPlatform>
<ICC-header:CMMFlags>Not Embedded, Independent</ICC-header:CMMFlags>
<ICC-header:DeviceManufacturer></ICC-header:DeviceManufacturer>
<ICC-header:DeviceModel></ICC-header:DeviceModel>
<ICC-header:DeviceAttributes>Reflective, Glossy, Positive, Color</ICC-header:DeviceAttributes>
<ICC-header:RenderingIntent>Perceptual</ICC-header:RenderingIntent>
<ICC-header:ConnectionSpaceIlluminant>0.9642 1 0.82491</ICC-header:ConnectionSpaceIlluminant>
<ICC-header:ProfileCreator>basICColor GmbH</ICC-header:ProfileCreator>
<ICC-header:ProfileID>0</ICC-header:ProfileID>
<ICC_Profile:ProfileCopyright>Copyright (C) 2007 by Color Solutions, All Rights Reserved. License details can be found on: http://www.eci.org/eci/en/eciRGB.php</ICC_Profile:ProfileCopyright>
<ICC_Profile:ProfileDescription>eciRGB v2</ICC_Profile:ProfileDescription>
<ICC_Profile:MediaWhitePoint>0.9642 1 0.82491</ICC_Profile:MediaWhitePoint>
<ICC_Profile:RedTRC>(Binary data 1412 bytes, use -b option to extract)</ICC_Profile:RedTRC>
<ICC_Profile:GreenTRC>(Binary data 1412 bytes, use -b option to extract)</ICC_Profile:GreenTRC>
<ICC_Profile:BlueTRC>(Binary data 1412 bytes, use -b option to extract)</ICC_Profile:BlueTRC>
<ICC_Profile:RedMatrixColumn>0.65027 0.32028 0</ICC_Profile:RedMatrixColumn>
<ICC_Profile:GreenMatrixColumn>0.17804 0.60205 0.06783</ICC_Profile:GreenMatrixColumn>
<ICC_Profile:BlueMatrixColumn>0.13588 0.07767 0.75708</ICC_Profile:BlueMatrixColumn>
<Composite:ImageSize>1961x2884</Composite:ImageSize>
<Composite:Megapixels>5.7</Composite:Megapixels>
</rdf:Description>
</rdf:RDF>
Which shows the JPEG contains an embedded ICC profile.
So in this case, ICC profile is only embedded at the JPEG level, and not at the PDF (Image XObject) level.
Using kort004mult01_01_50.pdf
as an example again. Pdfimages output for one image:
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
5 4 image 1961 2884 rgb 3 8 jpeg no 12 0 301 301 108K 0.7%
And (again) the corresponding Image XObject:
12 0 obj
<<
/BitsPerComponent 8
/ColorSpace /DeviceRGB
/Filter [ /DCTDecode ]
/Height 2884
/Length 110562
/Subtype /Image
/Type /XObject
/Width 1961
>>
stream
...
endstream
endobj
Most of the properties reported by pdfimages follow directly from the Image XObject's dictionary entries (see ISO 32000-1, section 8.9.5.1):
Pdfimages property | Dictionary entry |
---|---|
width | Width |
height | Height |
color | ColorSpace |
bpc | BitsPerComponent |
enc | Filter |
interp | Interpolate |
ID | ID |
It's not entirely clear to me what the comp
(number of color components) value is based on, as there's no corresponding Image Dictionary entry. The same is true for the x-ppi
and y-ppi
values.
From the source code it seems that Pdfimages calculates x-ppi and y-ppi from the image dimensions relative to the page size (although it's not entirely clear to me what the code does exactly).
Also worth mentioning that in this case the reported x-ppi
and y-ppi
values are marginally different from the values in the JPEG header fields:
<JFIF:XResolution>300</JFIF:XResolution>
<JFIF:YResolution>300</JFIF:YResolution>
Several characteristics (resolution, ICC -profiles) can be defined at either the image level (e.g. ICC profile embedded in JPEG) or the PDF object level. And possibly they might not even be the same.
Might be helpful to do a detailed breakdown of a few examples to get a better grip on this. E.g.:
Examples could then be included in documentation, or a blog post.