Clarification on pHash implementation

Want to start by saying I love this repo! If it wasn't for your hard work I would not have been able to learn as much as I have.

I am currently in the process of working on optimizing the speed for this code and possibly offloading it to GPU. Before doing it, I have been experimenting with pure NumPy before implementing it in cupy.

1) I found that using OpenCV impacts results substantially. I believe this is because of how Pillow performs anti-aliasing compared to NumPy. Switching Image.open with cv2.imread will show what I mean

2) Based on the article you posted, it seems like your implementation is different. Can you explain why you used median instead of average? And why you performed 2 DCT transforms instead of just 1? And also why you chose to include the first element even though the article recommended you don't. The article mentions the following " (using only the 8x8 DCT low-frequency values and excluding the first term since the DC coefficient can be significantly different from the other values and will throw off the average). Thanks to David Starkweather for the added information about pHash. He wrote: "the dct hash is based on the low 2D DCT coefficients starting at the second from lowest, leaving out the first DC term. This excludes completely flat image information (i.e. solid colors) from being included in the hash description." "

for example, the article explains this

import numpy as np
import cv2
import scipy
original_img = cv2.imread(img_path)
resize_dim = 32 #maybe include another value for this?  I know we extrapolate this using highfreq_factor but instead of using this would be easier to follow
low_pass_size = 8 #hash_size
small_img = cv2.resize(original_img, (resize_dim, resize_dim), cv2.INTER_LANCZOS4) 
small_img = cv2.cvtColor(small_img, cv2.COLOR_BGR2GRAY)
dct1 = scipy.fftpack.dct(small_img, axis =0)
high_freq = dct1[:low_pass_size, :low_pass_size].flatten()
avg_val = high_freq[1:].mean() #remove the first element to avoid having the coefficient impact
high_freq_masked = high_freq.copy()
high_freq_masked[np.where(high_freq >= avg_val)] = 1 
high_freq_masked[np.where(high_freq < avg_val)] = 0
print(hash(high_freq_masked.tostring())) #this is just for testing.

I'm sure there are reasons why you chose to implement pHash the way you did so by no means am I saying it was wrong, I'm more curious to understand your thought processes since I know you have tons more experience with this.

Lastly, do you support open-source contributions? Once I finish the cupy implementation I would be more than happy to add support to this repository.

For a opencv implementation, see https://github.com/JohannesBuchner/imagehash/pull/130 .

phash_simple is the implementation from the article. For phash vs phash_simple, see https://github.com/JohannesBuchner/imagehash/issues/13 . avg vs median is discussed also in the article comments.

Contributions are welcome. However, the policy of this repo is to provide as trivial and short a code as possible so people can easily understand and start hacking together more complex stuff. That is more important to me than code performance.

It may be interesting to add video hashing though, as discussed in the opencv issue. Not sure if it would deserve its own repo though.

these discussions are exactly what I was looking for. Thank you @JohannesBuchner

Ran some performace benchmarks on a set of 47 images. The goal was to look at which algorithm gave least differences vs PIL implementation and potential speed improvements.

Summary

cv2_area gave the least difference vs default imagehash.phash implementation, but is substantially slower.

cv2_lanczos4 is 6 - 7 times faster than default imagehash.phash (which is also using lanczos4), but has an average hamming distance of 7.3 vs default imagehash.phash

Here's the results

Running 6 interpolation algorithms on 47 unique images x 8 iterations
---------------------------------------------------------------------
 Benchname   |     Time     | HashDiff AVG | Images Hashed
cv2_area     |    4154ms    |     2.9      |     376     
pil_lanczos4 |    1452ms    |     0.0      |     376     
cv2_lanczos4 |    226ms     |     7.3      |     376     
cv2_linear   |    214ms     |     6.9      |     376     
cv2_nearest  |    217ms     |     9.2      |     376     
cv2_cubic    |    220ms     |     7.3      |     376

Notes

pil_lanczos4 is the default imagehash.phash implemetation

imagehash.__version__ = '4.3.1'
cv2.__version__ = '4.7.0'

Image reading was preprocessed

Appendix

Benchmark code snippet, excluding wrapper for running and printing cv2 impl modified from PR https://github.com/JohannesBuchner/imagehash/pull/130

import cv2
import scipy.fft
import cv2
import imagehash
from imagehash import ImageHash
import numpy as np
from PIL import Image

def phash_faster(image, hash_size=8, highfreq_factor=4, interpolation=cv2.INTER_NEAREST):
    """
    Perceptual Hash computation.
    Implementation follows http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
    @image must be a Numpy array/OpenCV instance.
    """
    if hash_size < 2:
        raise ValueError("Hash size must be greater than or equal to 2")

    img_size = hash_size * highfreq_factor
    image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    image = cv2.resize(image, (img_size, img_size), interpolation=interpolation)
    dct = scipy.fft.dct(scipy.fft.dct(image, axis=0), axis=1)
    dctlowfreq = dct[:hash_size, :hash_size]
    med = np.median(dctlowfreq)
    diff = dctlowfreq > med
    return ImageHash(diff)

def cv2_nearest(fp, img=None):
    img = cv2.imread(fp) if img is None else img
    return phash_faster(img, interpolation=cv2.INTER_NEAREST)

def cv2_area(fp, img=None):
    img = cv2.imread(fp) if img is None else img
    return phash_faster(img, interpolation=cv2.INTER_AREA)

def cv2_cubic(fp, img=None):
    img = cv2.imread(fp) if img is None else img
    return phash_faster(img, interpolation=cv2.INTER_CUBIC)

def cv2_lanczos4(fp, img=None):
    img = cv2.imread(fp) if img is None else img
    return phash_faster(img, interpolation=cv2.INTER_LANCZOS4)

def cv2_linear(fp, img=None):
    img = cv2.imread(fp) if img is None else img
    return phash_faster(img, interpolation=cv2.INTER_LINEAR)

def pil_lanczos4(fp, img=None):
    img = Image.open(fp) if img is None else img
    return imagehash.phash(img)

benches = {
    'cv2_area': cv2_area,
    'pil_lanczos4': pil_lanczos4,
    'cv2_lanczos4': cv2_lanczos4,
    'cv2_linear': cv2_linear,
    'cv2_nearest': cv2_nearest,
    'cv2_cubic': cv2_cubic,
}

JohannesBuchner / imagehash