akamhy / videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
https://pypi.org/project/videohash
MIT License
289 stars 46 forks source link

[WARNING] False Positive Issues #105

Open ziczhu opened 1 year ago

ziczhu commented 1 year ago

Currently, we are experiencing a high number of false positives when utilizing this library. In our scenario, approximately 70% of the results are false positives, which significantly impacts the accuracy of our application.

To address this issue, I suggest to use the following precheck before using the library:

  1. Preprocessing based on video length: Consider incorporating a preprocessing step that filters out videos with durations less than 1 minute. This criteria can help eliminate irrelevant and short-duration videos, which often contribute to false positive matches.

  2. Similarity threshold adjustment: Modify the similarity threshold used by the library to make it more stringent. By increasing the threshold, the library will only consider videos with a higher degree of similarity, reducing the occurrence of false positives. This adjustment can significantly improve the precision of the matching process.

  3. Comparison of video durations: Introduce a comparison mechanism that checks the proximity of video durations when assessing similarity. This step would ensure that two videos are not considered similar if their durations differ significantly. By including this additional criterion, we can reduce the occurrence of false positives caused by videos with vastly different lengths.

But still thanks to the author to provide this library for low cost comparison, but if you're using it in a very serious scenario, I would suggest use it like the bloom filter, and do intensive algorithm after positive result.

Qinmayyear commented 1 month ago

Wish I saw this earlier. This library cannot be use to detect videos less than 1min, there were many false positive cases :(