akamhy / videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
https://pypi.org/project/videohash
MIT License
264 stars 41 forks source link

BUG REPORT #88

Closed akamhy closed 2 years ago

akamhy commented 2 years ago

Describe the bug Hash collision for some videos of same length

To Reproduce

v1 = VideoHash(url="https://canvaz.scdn.co/upload/artist/3PhoLpVuITZKcymswpck5b/video/5e966e9c01f147cdae93a02c61a4bf7c.cnvs.mp4")
v2 = VideoHash(url="https://canvaz.scdn.co/upload/licensor/7JGwF0zhX9oItt9901OvB5/video/dc047df48f774d1590b61fd38bc082e4.cnvs.mp4")
print(v1 == v2)

Expected behavior The hash should be different but is same.

Screenshots NA

Please complete the following information:

Additional context The issue can probably be resolved by extracting more features such as brightness levels or maybe the most dominant colors of frames extracted at a specific FPS. Increase the number of hash bits to accommodate more data.

OR

why not use colorhash + whash and change the bit site to 128( twice of 64, the current size)? They generate hash is very different ways and collisions should be highly unlikely.

akamhy commented 2 years ago

Use Resampling.LANCZOS

akamhy commented 2 years ago

https://user-images.githubusercontent.com/64683866/168869109-1f77c839-6912-4e24-8738-42cb15f3ab47.mp4

akamhy commented 2 years ago

https://user-images.githubusercontent.com/64683866/168872267-7c6682f8-7294-4d9a-8a68-8c6f44c06df6.mp4