KilianB / JImageHash

Perceptual image hashing library used to match similar images
MIT License
401 stars 81 forks source link

Sign bit and hash legnth #9

Closed anatolyra closed 5 years ago

anatolyra commented 5 years ago

Hi,

I'm trying to understand the significance of this particular line. It seems that the final result, represented by BigInteger, is initialized by BigInteger.ONE just for the sign of the final value. We're trying to generate a 64-bit hash value and then store it as a long. Unfortunately, the final bit size of this -> 'new PerceptiveHash(64)', is 65 bits. Ignoring the sign bit, by calling 'getHashValue().longValue()' produces different accuracy results for the same two image.

Am I missing something? Or was this the intended behavior?

Thanks!

https://github.com/KilianB/JImageHash/blob/008aafbb54221a682a0517a97ba0b81930ac36da/src/main/java/com/github/kilianB/hashAlgorithms/PerceptiveHash.java#L90

KilianB commented 5 years ago

If you are having a BigInteger starting with just a 0 and shift it to the left you end up with a hash of 00 .

As 0's and 1's both hold information and to comply with consistent hash lengths I did not want to drop any data. "00010" and "0010" ( -> 10 ) are not the same hashes but would be considered equal if there is no leading 1.

Consider this in a normal base 10 number space. 1856 can also be represented as 0001856. In fact you can add arbitrarily many 0's to the beginning and it does not change value of the number but in our cases 0's are important!

You may also want to wait until later today. I am pushing release 2.0.0 with a rotational invariant hash, database examples, a few tweaks to speed up hash generation by a great bit and a little utility to compare the performance and accuracy of different hash algorithms. Sadly this will be a major version bump. Breaking change are introduced and hashes won't be compatible after version 2.0.0

getHashValue().longValue()'

Can you simply create a hash with a 63 bit hash? You can also drop the last bit if necessary you will loose a tiny bit of accuracy but get your desired length key. This is also what the new hash is doing. Let me think about it a little bit.

KilianB commented 5 years ago

@anatolyra

I have taken a more in depth look at it and it seems like the sign bit is an artifact resulting from earlier versions in which I did not xor the hash but did it manually bit by bit.

The difference in the hash can be explained by the bitlength which is used to compute the normalized distance. If we extract this information and put it into an external field we should get away without the sign/padding bit.

https://github.com/KilianB/JImageHash/blob/2c28d9d5574c158a1dbfbc8c6b2d000b257388e2/src/main/java/com/github/kilianB/matcher/Hash.java#L114

    return hammingDistance(h)  / ((double)this.hashValue.bitLength()-1);

I'll need to run all unit tests to see if everything still works after the changes but it looks promising. Good catch

anatolyra commented 5 years ago

Sounds good! I'm waiting for version 2.0.0 then. Do you have an eta?

Thanks!

KilianB commented 5 years ago

Sorry again that it took longer than expected. I added multiple new features which got a bit more tricky and real life happened. I'll hope to get everything done and published by tomorrow.

KilianB commented 5 years ago

Merged with #13

anatolyra commented 5 years ago

Thank you!