erdogant / undouble

Python package undouble is to detect (near-)identical images.
BSD 3-Clause "New" or "Revised" License
47 stars 0 forks source link

Not able to detect all the duplicate images #2

Closed royinblr closed 2 years ago

royinblr commented 2 years ago

NOVA-NA-Dry-Iron-Grey-SDL881739255-1-3a886 nova-plus-1100-w-amaze-ni-10-original-imaf3qxpabhhdwss philips-hi113-hi113-original-imafkmkfuwuafnpf philips-hl114-1000-w-dry-iron-500x500 deson deson-copy deson-steam-iron-250x250 deson-steam-iron-500x500 ki735 ki7352 Not all the duplicates are detected.

erdogant commented 2 years ago

With model.compute_hash(method='phash', hash_size=16) I was able to detect some duplicates but it seems that certain images require some pre-processing because a black or white border is present or the image is stretched. This makes a huge difference in the hash.

Example: image

erdogant commented 2 years ago

send me a pm if you want this functionality to be added, and we can discuss what's possible.