arsenetar / dupeguru

Find duplicate files
https://dupeguru.voltaicideas.net
GNU General Public License v3.0
5.24k stars 412 forks source link

Photo contents mode does not actually work #689

Open tfreedman opened 4 years ago

tfreedman commented 4 years ago

I recently tracked down a bug in a camera's firmware, causing EXIF tags to be written incorrectly to certain images. The images themselves were okay, but the EXIF data was mangled. I duplicated the folder, repaired the EXIF data using exiftool, and then wanted to confirm my work using dupeGuru.

Obviously, comparing the files using contents filtering wouldn't work, because the hashes for the images themselves are different. However, using picture blocks with filtering set to 100 should work, because the image data is the same. Unfortunately, it does not. Matches, despite being optically identical, are rejected for no clear reason.

stuckj commented 1 month ago

I just ran into this earlier this week using dupeguru. It's explained in the docs

A threshold of 100 adds an additional constraint that pictures have to be exactly the same (it’s possible, due to averaging, that the tile comparison yields 0 for pictures that aren’t exactly the same, but since “100%” suggests “exactly the same”, we discard those ocurrences). If you want to get pictures that are very, very similar but still allow a bit of fuzzy differences, go for 99%.

My guess is this doesn't just apply to the block comparison, but means the files have to literally be exactly the same. Your exif changes would make that not the case.