Test alternate hashing mechanisms

commonsmachinery / blockhash

MIT License

88 stars 28 forks source link

According to https://twitter.com/jonaso/status/549967116485808128

Instead of taking a mean over horizontal bands and comparing the reduced pixel (ie the 16x16 image) against this mean, make a local mean over a 3x3 pixel cell and compare the pixel against this mean. Set the bit to 1 or 0 according to whether the pixel value is above or below the mean. This will generate a hash with more vivid bit changes and thus less hash collisions.

It is also possible to calculate the gradient for each cell (pixel), and comparing that to a mean for neighboring cells like the previous described method. This will generate a bit that is independent of local intensity variations, only the gradient will remain.

If you use a gradient it could be wise to use two orthogonal vectors to describe the plane, and it could also be wise to only use the absolute value.

One interesting variation is to hash the gradient into bins and form fingerprints from that. Such a fingerprint can be somewhat resistant to cropping if done right.

The local mean is implemented by @artfwo in this branch: https://github.com/commonsmachinery/blockhash/tree/alternate-hashing

In our test cases, it produces much better hash values in terms of their uniqueness and vividness (if we call it that), but we take a significant hit on false negatives, ie. we get a lot more images that should be matched as the same which aren't.

That said, I've now tried several other window "sizes" for the mean, upwards of 11x11 and in some different shapes. The best contestant so far is a 9x9 window, which generate just a few more false negatives, but cuts the false positives almost in half. A large window of 11x11 generate less false negatives but generate a few more false positives compared to a 9x9 window.

More testing will follow! :)

commonsmachinery / blockhash

Test alternate hashing mechanisms #6