Open bitsgalore opened 4 hours ago
Quick test shows results (and limitations!) are identical to ImageMagick / ExifTool. Still useful to include this.
Here's an example that shows why quality detection fails for some images. Example here is "kort004mult/300ppi-50/images/crap-008.jpg".
Quantization tables:
{0: array('B', [45, 31, 34, 39, 34, 28, 45, 39, 36, 39, 51, 48, 45, 53, 68, 113, 73, 68, 62, 62, 68, 139, 99, 105, 82, 113, 164, 144, 173, 170, 161, 144, 158, 156, 181, 204, 255, 221, 181, 193, 247, 195, 156, 158, 227, 255, 229, 247, 255, 255, 255, 255, 255, 176, 218, 255, 255, 255, 255, 255, 255, 255, 255, 255]),
1: array('B', [48, 51, 51, 68, 59, 68, 133, 73, 73, 133, 255, 187, 158, 187, 187, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255])}
So we have a dictionary with 2 arrays (which means HASH_2 and _SUMS_2 are used).
Added a print statement to see what happens in this loop:
for i in range(100):
print("i, qvalue, hash, qsum, sums: ", str(i), str(qvalue), str(hash[i]), str(qsum), str(sums[i]))
if ((qvalue < hash[i]) and (qsum < sums[i])):
continue
if (((qvalue <= hash[i]) and (qsum <= sums[i])) or (i >= 50)):
return i+1
break
Result:
i, qvalue, hash, qsum, sums: 0 513 1020 24028 32640
i, qvalue, hash, qsum, sums: 1 513 1015 24028 32635
i, qvalue, hash, qsum, sums: 2 513 932 24028 32266
i, qvalue, hash, qsum, sums: 3 513 848 24028 31495
i, qvalue, hash, qsum, sums: 4 513 780 24028 30665
i, qvalue, hash, qsum, sums: 5 513 735 24028 29804
i, qvalue, hash, qsum, sums: 6 513 702 24028 29146
i, qvalue, hash, qsum, sums: 7 513 679 24028 28599
i, qvalue, hash, qsum, sums: 8 513 660 24028 28104
i, qvalue, hash, qsum, sums: 9 513 645 24028 27670
i, qvalue, hash, qsum, sums: 10 513 632 24028 27225
i, qvalue, hash, qsum, sums: 11 513 623 24028 26725
i, qvalue, hash, qsum, sums: 12 513 613 24028 26210
i, qvalue, hash, qsum, sums: 13 513 607 24028 25716
i, qvalue, hash, qsum, sums: 14 513 600 24028 25240
i, qvalue, hash, qsum, sums: 15 513 594 24028 24789
i, qvalue, hash, qsum, sums: 16 513 589 24028 24373
i, qvalue, hash, qsum, sums: 17 513 585 24028 23946
So for i=17 qsum becomes > sums[i] which means it will not continue. Then we end up here:
if (((qvalue <= hash[i]) and (qsum <= sums[i])) or (i >= 50)):
return i+1
Since both test conditions fail we end up with an undefined quality level!
Did some tests with ImageMagick generated images:
convert -quality 20 wizard: wizard-20.jpg
convert -quality 15 wizard: wizard-15.jpg
convert -quality 10 wizard: wizard-10.jpg
Here correct q value was reported up to q=15, but not for q=10. So failure to return a value here seems to indicated poor quality.
As an additional test, I lowered the i threshold of 50:
if (((qvalue <= hash[i]) and (qsum <= sums[i])) or (i >= 50)):
into 5:
if (((qvalue <= hash[i]) and (qsum <= sums[i])) or (i >= 5)):
In this case q is reported for the lower quality levels as well (and values correspond to IM creation values).
Which makes me wonder if we could simply drop this threshold altogether?
Possibly relevant here - Neil Krawetz on estimating JPEG quality with the "Approximate Ratios" method:
since the JPEG Standard changes algorithms at quality values below 50%, this method can become unreliable with very low quality images.
But it's not clear to me if the IM method is based on "Approximate Ratios" or " Approximate Quantization Tables". In the latter case: does this method also become less reliable at lower qualities?
Update: as a test I uploaded the image from this ImageMagick issue to Neil Krawetz's fotoforensics service.
Result here.
JPEG last saved at 18% quality (estimated)
Site mentions "Quality determined from the quantization tables that encoded the JPEG", also shows 2 quantization tables that are identical to those extracted with Pillow.
Re-calculation with the Python code (setting lower threshold to 1) also results in value of 18. So I suspect this uses the same method. In that case I think it would make sense to drop the 50% threshold.
Not clear if this is possible, but some interesting ideas in this thread:
https://stackoverflow.com/questions/4354543/determining-jpg-quality-in-python-pil
This reply points to a Python port of the ImageMagick JPEG quality heuristic:
https://gist.github.com/eddy-geek/c0f01dc5401dc50a49a0a821cdc9b3e8#file-jpg_quality_pil_magick-py