Having the exact same file twice (example: copying files over from a camera, renaming them with the script, and re-importing them later)
Same image data, but different metadata, e.g. a comment
Ideas:
Do a coarse detection with a quick hashing algorithm, then a fine one with byte by byte comparison (solves 1 but not 2)
Different matchers for comparing...
file hashes
files byte by byte
image data byte by byte (exif tags stripped)
file sizes
image dimensions
image pixels
tolerance parameter? (match if image_a[x,y] - image_b[x, y] < tolerance)
image pixels with scaling support?
parameter which one to prefer when deleting, larger or smaller?
Diff in the same modular way the timestamp read is done right now, so that the user could specify in the command line or config file which matchers they want to use
Problems / use cases:
Ideas:
,
exif_rename --detect-duplicates --set-matcher file-size,md5,file-bytes --remove-duplicates