akamhy / videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
https://pypi.org/project/videohash
MIT License
264 stars 41 forks source link

[Feature Request] Hash based on limited number of frames #107

Open c22 opened 11 months ago

c22 commented 11 months ago

My use case for this software would only be in needing to compare the hash of the first few seconds of video for hundreds of files of varying lengths. The reason for this is part of a classification task ie. I have a lot of files and want to classify them based on the contents of the first few seconds.

I could create a script which trims the videos all to 2-3 seconds long then use videohash on those clips, then relate those results back to their original clip but it would be great if videohash could handle all of this for me.

What I imagine would be something like having an max_frames parameter added to the VideoHash function.

eg. videohash.VideoHash(..., frame_interval=0.2, max_frames=10) would provide me a hash based on 10 frames from the first ~2 seconds of video.

I could also see perhaps setting a time range being handy instead, eg. start_time: '2:00', end_time: '2:30' would hash only that 30 second clip from the video. This would solve my use case but also be a more general solution for other use cases, though I think it may be a little more nuanced to implement vs. the first proposal.

Interested to hear the maintainers thoughts on this as I might be able to tackle a solution if there's interest.

c22 commented 11 months ago

I can also see this potentially being sold as a kind of "performance" trick for users who would be happy to say conclude that two videos are a match based only on comparing the first few minutes of each video.

shashi-netra commented 8 months ago

wouldn't it be easy enough to split your video using ffmpeg and then videohash the video clips?