JohannesBuchner / imagehash

A Python Perceptual Image Hashing Module
BSD 2-Clause "Simplified" License
3.28k stars 331 forks source link

Added function phash_faster using opencv #130

Closed Sadiqush closed 2 years ago

Sadiqush commented 3 years ago

Hi I recently wrote a program for video comparing done frame by frame and I used ImageHash for phash computing, but the default phash function was immensely slow; more specifically the convert and resize functions of PIL and numpy.asarray() were the problem. So I rewrote those parts with OpenCV and in my case the result was 10-15 times faster. I thought of replacing phash function entirely, but it would be inconsistent with rest of the project. So I added another function (phash_faster) and although importing cv2 inside the function makes it call each time which is not the best idea, it also makes it an unnecessary dependency.

It is possible to get same PIL instance as input then convert it to array using numpy.asarray and only replace PIL functions with OpenCV, but still it will be too slow and defeats the purpose.

If continuing with PIL is not necessary I suggest to convert all other PIL dependencies to OpenCV.

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-3.7%) to 86.348% when pulling 92792a05852b5fc0e2582196195de337402fe058 on Sadiqush:master into 2e6eb38f06741286282733470c173a057e186c0a on JohannesBuchner:master.

JohannesBuchner commented 3 years ago

Is the output identical, or do the functions slightly differ in behaviour?

I suppose you can get even faster results if you set up your video decoder to deliver images in the desired size and grayscale. You probably also only want to handle keyframes. ffmpeg can do both.

Sadiqush commented 3 years ago

They have differences but its almost identical, here's an example of an image: output of phash: d3226899a772cb87 output of phash_faster: d3227899a772ca87 In my case this trade-off worth it considering the speed it brings. With this OpenCV function I personally didn't find anything change in result of the final program.

Thanks for your advice! The problem with ffmpeg was it works with keyframes if the video is raw, our input was mp4. However It doesn't matter anymore, that program is shipped and is currently working fine. I just thought maybe I can help other people benefit from the speed of that function.

JohannesBuchner commented 2 years ago

Thank you, I think this is helpful for people who want to go down a similar path. For now, I will close this issue, to avoid too many dependencies and code complexity. I think one appeal of imagehash is the simple code, it's easy to modify it.

PathToLife commented 1 year ago

Code here helped with looking into cv2 differences vs pil.

Results can be found here for anyone looking to try cv2 for performance improvements.

https://github.com/JohannesBuchner/imagehash/issues/181