JohannesBuchner / imagehash

A Python Perceptual Image Hashing Module
BSD 2-Clause "Simplified" License
3.28k stars 331 forks source link

Implemented the hash used in https://github.com/Naranbataar/Aspect #173

Closed KOLANICH closed 2 years ago

KOLANICH commented 2 years ago

@Naranbataar

JohannesBuchner commented 2 years ago

How does this work?

What's the difference to the existing DCT-based hash?

KOLANICH commented 2 years ago

It does the similar thing, composing low frequencies into a hash, but it does it differently (and IMHO a bit strangely), resulting into a different hash.

The tool splits an image into blocks, computes averages among blocks (so, resizes the image to 8x8, but the pixels to blocks are assigned in a bit strange way), flattens the pixels into a linear 64-elements array (IMHO it is also strange), computes DCT, then folds the DCT into a hash.

The only reason to have it is for compatibility to that tool.

JohannesBuchner commented 2 years ago

I think this is not a very compelling reason. There are lots of libraries out there (e.g. pHash) which do things just slightly differently. Getting all the versions into this library will lead to a mess. I'd be happy to add such a hash if there was a demonstration that the hash is substantially different in behaviour to the existing implementations.

KOLANICH commented 2 years ago

I think this is not a very compelling reason. There are lots of libraries out there (e.g. pHash) which do things just slightly differently.

And I wonder if their hashes should be added too. Or I can implement it as a separate package.

I'd be happy to add such a hash if there was a demonstration that the hash is substantially different in behaviour to the existing implementations.

I don't think so. The only advantage of that hash I see that the impl within the tool, including DCT code, is small and can be easily ported, for example, into JS.

JohannesBuchner commented 2 years ago

Perhaps a imagehash.compat subpackage, not imported by default, could be a way to support this.

KOLANICH commented 2 years ago

or maybe not compat, but just hashes, and all the hewly added hashes going into separate files within it in future (and maybe somewhen break the compatibility and move the hashes out of the main package there)?

KOLANICH commented 2 years ago

This "perceptual hash" is complete bullshit.

JohannesBuchner commented 2 years ago

OK, could you expand why? You seem to have understood it more deeply, which could help other people.

KOLANICH commented 2 years ago

While examining further (while rewriting it into JS) the implementation details of that hash I have noticed the line data2[sj + (si * 8)] = (data2[sj + (si * 8)] + px) / 2.0; which can be simplified as accum = 0.5 * accum + 0.5 * px. It's an exponential moving average. While it is widely used for smoothing realtike signals, it assigns exponentially small weights to past values (block_ave[sj, si] = 0.5 * block[-1, -1] /* the last pix of the last row in a block */ + 0.25 * block[-1, -2] + 0.125 * block[-1, -3] + ... + 2**(-9 * 1) * block[-2, -1] /* the last pix of the pre-last line of a block */ ... and so on). So this formula discards contributions of all but the last few pixels in the last line in a block and also looks like nonsense. I guess the author has intended to use a simple mean. Flattening the 2D matrix of averages before taking its 1D Fourier transform also looks like nonsense.

I have also found other numerous issues within that project. I believe that it was not intended to be actually used and/or found by anyone.

JohannesBuchner commented 2 years ago

I see. Thanks for looking into it.