JohannesBuchner / imagehash

A Python Perceptual Image Hashing Module
BSD 2-Clause "Simplified" License
3.18k stars 328 forks source link

Any reason the ImageHaash.hash's shape is (1, N^2) rather than (N^2)? #79

Open hj3yoo opened 6 years ago

hj3yoo commented 6 years ago

When I look at this, all of the computations of ImageHash requires flattening out the hash. When I was profiling (for hash_size=32), each of the flattening is adding 0.5us of overhead that can be avoided. It might be small, but I have code where I need to subtract one hash from 10,000 stored hashes every frame I'm processing, and it adds more than 10ms per frame. This forces me to copy paste the arithmetic of ImageHash to my code snippet (instead of calling some_hash - other_hash)

I don't see any reason to store the flattened version of it in the first place. In other words:

def __init__(self, binary_array):
    self.hash = binary_array.flatten()
JohannesBuchner commented 6 years ago

Hmm, I think I did that because one could make hashes of (6,4) or (4,6) in principle. The functions allow one to compare these for convenience because some databases may store the _binary_array_to_hex and hex_to_hash lose the shape information (flattening things). Maybe we could make the .hash field flat in __init__ as you say and add a .shape property?

hj3yoo commented 6 years ago

It might be a better alternative. You can convert from the proposed fields (self.hash, self.shape) into the original field, too.