AntiCompositeNumber / iNaturalistReviewer

Automatic iNaturalist reviewer for Commons
GNU General Public License v3.0
1 stars 0 forks source link

Investigate ImageHash crop-resistant hashing #182

Closed AntiCompositeNumber closed 1 year ago

AntiCompositeNumber commented 1 year ago

Currently, iNR uses the perceptual hash function from https://github.com/JohannesBuchner/imagehash to fuzzy match images. This works well for resized images and minor crops, but nor more major crops. ImageHash recently added support for crop-resistant hashing based on https://ieeexplore.ieee.org/document/6980335. This algorithm would likely reduce false negatives, but as always could introduce false positives. (For the record, I have never been made aware of a false positive for the perceptual hash with a maximum Hamming distance of 4 in iNR operation).

AntiCompositeNumber commented 1 year ago

I did some brief experimentation with crop_resistant_hash. In some cases, the Hamming distance between the two values is the same for every image in the set. In other cases, one image will have a lower Hamming distance. However, it does not seem possible to perform an absolute comparison with any reasonable error margin.

For example, File:Hipposideros khaokhouayensis.jpg:

2023-03-28 23:06:52 inrbot INFO: Comparing photos using crop_resistant_hash
2023-03-28 23:06:52 inrbot DEBUG: Comparing https://www.inaturalist.org/photos/33536162
2023-03-28 23:06:53 inrbot DEBUG: crop_resistant_hash Hamming distance: 1.15625
2023-03-28 23:06:53 inrbot DEBUG: Comparing https://www.inaturalist.org/photos/33536166
2023-03-28 23:06:53 inrbot DEBUG: crop_resistant_hash Hamming distance: 2
2023-03-28 23:06:53 inrbot DEBUG: Comparing https://www.inaturalist.org/photos/33536169
2023-03-28 23:06:54 inrbot DEBUG: crop_resistant_hash Hamming distance: 1.21875
2023-03-28 23:06:54 inrbot INFO: Comparing photos using phash
2023-03-28 23:06:54 inrbot DEBUG: Comparing https://www.inaturalist.org/photos/33536162
2023-03-28 23:06:54 inrbot DEBUG: PHash Hamming distance: 32
2023-03-28 23:06:54 inrbot DEBUG: Comparing https://www.inaturalist.org/photos/33536166
2023-03-28 23:06:54 inrbot DEBUG: PHash Hamming distance: 30
2023-03-28 23:06:54 inrbot DEBUG: Comparing https://www.inaturalist.org/photos/33536169
2023-03-28 23:06:54 inrbot DEBUG: PHash Hamming distance: 14

The difference between the matching value and non-matching values is small. All three of the distances for this set are smaller than the matching value for other sets.