Closed 2bllk closed 7 months ago
Coincidentally, I noticed this yesterday with a MacOS screenshot, I think it was.
Thanks for all the details. Looks like an easy fix.
The code is here: https://github.com/johnwhitington/cpdf-source/blob/master/cpdfjpeg.ml
It's a transliteration into OCaml of the well-known snippet of code you quote, I believe.
Yes, it is the same code. The comment in your code have the same author as the site listed in the answer on Stack Overflow.
I tweaked the Python code mentioned earlier. The validation works correctly.
def validate_jpeg(data):
i = 0
if(data[i] == 0xFF and data[i+1] == 0xD8):
if (data[i+2] == 0xFF and data[i+3] == 0xE0):
if (data[i+6] == ord('J') and data[i+7] == ord('F') and data[i+8] == ord('I') and data[i+9] == ord('F') and data[i+10] == 0x00):
return True # Valid JFIF string
else:
return False # Not a valid JFIF string
else:
if (data[i+2] == 0xFF and data[i+3] == 0xE1):
if (data[i+6] == ord('E') and data[i+7] == ord('x') and data[i+8] == ord('i') and data[i+9] == ord('f') and data[i+10] == 0x00):
return True # Valid Exif string
else:
return False # Not a valid Exif string
else:
return False # Have'nt valid JFIF or Exif block
else:
return False #Not a valid SOI header
def get_jpeg_size(data):
"""
Gets the JPEG size from the array of data passed to the function, file reference: http:#www.obrador.com/essentialjpeg/headerinfo.htm
"""
#Check for valid JPEG image
if(not validate_jpeg(data)):
return False
data_size=len(data)
i=4 # Keeps track of the position within the file
#Retrieve the block length of the first block since the first block will not contain the size of file
block_length = data[i] * 256 + data[i+1]
while (i<data_size):
i+=block_length #Increase the file index to get to the next block
if(i >= data_size): return False; #Check to protect against segmentation faults
if(data[i] != 0xFF): return False; #Check that we are truly at the start of another block
if(data[i+1] == 0xC0): #0xFFC0 is the "Start of frame" marker which contains the file size
#The structure of the 0xFFC0 block is quite simple [0xFFC0][ushort length][uchar precision][ushort x][ushort y]
height = data[i+5]*256 + data[i+6];
width = data[i+7]*256 + data[i+8];
return height, width
else:
i+=2; #Skip the block marker
block_length = data[i] * 256 + data[i+1] #Go to the next block
return False #If this point is reached then no size was found
Observation: loading an Exif into Adobe Acrobat and saving as a PDF rewrites the JPEG file to have a JFIF header:
ÿØÿà JFIF ÿá+Exif MM * ® ¶( 1 ¾2 Æ<
In the manual, paragraph 17.3 «Make a PDF from a PNG or JPEG image» states the following:
When I tried to run this command with a random JPEG image, I got the following error:
This error occurs for any images that use the Exif standard rather than JFIF. Most likely, the program code is looking for the JFIF format signature in the file, and if it does not find it, it creates an error.
When I changed the Exif application segment to a JFIF segment in the binary file of an image that failed to convert to PDF, the conversion was successful.
Most likely the image validation in the utility code looks something like this:
In this code, validation is performed as follows: 1) the signature
0xFF 0xD8 0xFF 0xE0
is checked, where:0xFF 0xD8
stands for Start of image (SOI) segment marker;0xFF 0xE0
means APP0 marker of the JFIF application block segment. 2) the APP0 segment identifier, which is the ASCII stringJFIF
, is checked at offset0x06
.When I changed the values of
0xFF 0xE1
(Exif segment APP1 marker) to0xFF 0xE0
(JFIF segment APP0 marker) in the image, and also changed the value ofExif
toJFIF
at offset0x06
, the image passed the validation in your program.Please add the possibility to work with JPEG/Exif images. Most likely it will be enough to change the validation process.