Open christopherwingert opened 1 year ago
I'm having this problem too, and i've spend all day today debugging why does this happen.
So far i've discovered this:
the "is_similar" function in videohash.py do this check:
if self - other <= ceil((self.similar_percentage / 100) * self.bits_in_hash)
BUT videohash.py also defines these two things: self.bits_in_hash = 64 self.similar_percentage = 15
so the previous check ALWAYS boils down to: if self - other <= ceil((15 / 100) * 64) which is ALWAYS = 10
basically changing the "is_similar" function from if self - other <= ceil((self.similar_percentage / 100) * self.bits_in_hash) to if self - other <= 10 returns the same results, and i've tested this with a semple of 1000 videos. The results are identical both with the default check and when using "if self - other <= 10"
Correct me if i'm wrong, i'm quite noob-ish here and just doing some observations... infact i'm not even sure mathematically speaking what this check is doing exactly.
ALSO i think this can be related somewhat to the issue #94 "Hash Collision" if that might help...
Would modifying similar_percentage help? If so, which direction should I go?