JohannesBuchner / imagehash

A Python Perceptual Image Hashing Module
BSD 2-Clause "Simplified" License
3.14k stars 328 forks source link

ImageMultiHash `__sub__` is not commutative #152

Open nh2 opened 2 years ago

nh2 commented 2 years ago

`__sub__ on ImageMultiHash on 2 different images is in most cases not commutative.

For example,

a - b = 4.25
b - a = 10.25

hash_diff() is communative. It is very confusing to me that from a commutative result (the (matches, sum_distance) tuple), a non-commutative distance metric is derived.

This creates nonsensical results when trying to find the most similar match for every image in a set of images (e.g. using best_match()), because the sorting by metric now depends on the order in which the images are given.

It seems to me that best_match() would be better implemented by simply sorting the (matches, sum_distance) results descending by matches, ascending by sum_distance, instead of going via this non-commutative __sub__.

Do you agree?

(Also, __sub__ on ImageHash is commutative.)

JohannesBuchner commented 2 years ago

pinging @joshcoales who contributed this code and perhaps has an opinion.

SpangleLabs commented 2 years ago

Apologies, yeah, I remember this being an absolute pain.

I agree that sorting matches descending, then sum_distance ascending would be best. I was trying to merge that down into a single int for comparison, but it's a bit of a nightmare, yeah.

JohannesBuchner commented 2 years ago

Perhaps you could prepare a PR (plus test of commutativity, perhaps for all hash types), and add comments with some demonstrations of the behaviour before and after.