KilianB / JImageHash

Perceptual image hashing library used to match similar images
MIT License
397 stars 80 forks source link

GIFs getting the same hash when the first frame is identical #33

Open anatolyra opened 5 years ago

anatolyra commented 5 years ago

Hi,

When generating hash values for two GIF files. I'm getting the same hash value for both if the first frame in both is identical. two_dogs_1 two_dogs_2

Is that the expected behavior?

Thanks!

KilianB commented 5 years ago

it currently is, but maybe we can alter it to something you seem appropriate.

What behavior would you like?

Create a single hash for the entire gif?

KilianB commented 5 years ago

My suggestion is to create a "gif hash collection" allowing for different similarity distances.

intersect find image matches contained in both gif collections distinct 1 - intersect totalDistance summed distance frame by frame minDistance summed distance for each frame to the closest frame distanceShifted total distance but shifted to create the lowest value

anatolyra commented 5 years ago

I like what you suggest. A couple of things:

  1. intersect - to find image matches in both collections, you'll have to allow for giving a minimum similarity value
  2. What do you mean by distinct?
  3. Maybe give a result of average distance and variance?

Thanks!

KilianB commented 5 years ago

I define distinct as the inverse operation of intersection. Return all images which are unique to one collection.

The issue tracker serves as notes and comments, therefore don't worry if it gets a bit messy. I am just writing down random thoughts.

Coding all of this is trivial and can be done within a short time, the issue arrises from a design point perspective:

Note: This link explains how frames can be extracted from gif images: https://stackoverflow.com/questions/8933893/convert-each-animated-gif-frame-to-a-separate-bufferedimage . This method requires a file as an input, we should also support a utility loader for gif images to not require the user to perform the same FileIO multiple times if he want's to hash the same gif with multiple algorithms. Are there any gif containers available or should we create our own bufferedImagecollection?

Do we want to overload the hash method of hashing algorithms checking if the supplied image is a gif and create the appropriate GifHashCollection, or create an entirely new method hashGif?