commonsmachinery / blockhash-js

Image perceptual hash calculation in javascript
http://blockhash.io/
MIT License
172 stars 49 forks source link

one_bits variable #20

Open september28 opened 5 years ago

september28 commented 5 years ago

I was wondering if anyone could shed light on the one_bits array since I do not have access to the original paper upon which this library is based.

I can see that the hammingDistance function essentially calculates the bit difference between the relevant char in the two hashes to be compared. this XOR operation n1^n2 will, in theory, be in the range 0 to 15, since parseInt("0",16)=0 and parseInt("f",16)=15. So my question is that the one_bits array seems to essentially "decelerate" the hamming distance by increasing the distance by either 0,1,2,3, or 4 instead of the bit difference (0-15). Why does the algorithm do this? var one_bits = [0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4]; Interestingly, the one_bits array is non-linear, so a bit difference of 3 will increase distance by 2 whereas a bit difference of 4 will increase distance by 1...??

Thanks!

artfwo commented 5 years ago

This array is just a helper for quickly comparing hexadecimal hashes. Index is the value of a xor operation on 4 consecutive bits in a pair of hashes. Value is the number of bits that differ.

For example take a look at the following hashes:

1a2b8c3d4
1a2bfc3d4

The numbers at index 4 are different: 0x8 and 0xf respectively. Their binary values are:

0x8 == 0b1000
0xf == 0b1111

The result of the xor operation will be 0x8 ^ 0xf == 0b0111 which is equal to 7. one_bits[7] is equal to 3, which is the number of different bits in 0x8 and 0xf. Hope this helps!