elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
420 stars 65 forks source link

Bug: Incorrect MSE values for certain folder input parameters #88

Closed MarcG2 closed 4 months ago

MarcG2 commented 6 months ago

I have a test case of 3 images. 2 are the identical and 1 is a variation. (see attached zip)

The two runs below are identical except the "Variation" folder is located on a different drive. The match detection should be the same between the two runs. But the first run gives incorrect results where it says "original.jpg" is identical to "variation.jpg". In reality, the mse should be the same as "original - copy.jpg"

The second run where the folders have the same drive letter, gives the correct result.

Folders on different drive
dif.py -D  "F:\Original" "D:\Variation" -Z "F:\_Cache\JSON" -s 2 -px 60 -le False -r False

{"67272710370400496506653709460945713615": {"location": "D:\\Variation\\Variation.jpg", "matches": {"59460932096355419373186387331143593406": {"location": "F:\\Original\\Original - Copy.jpg", "mse": 0.10824074074074073}, "103261221038948797102763262668914620127": {"location": "F:\\Original\\Original.jpg", "mse": 0.0}}}}

Folders on same drive.
dif.py -D  "F:\Original" "F:\Variation" -Z "F:\_Cache\JSON" -s 2 -px 60 -le False -r False

{"228688795827367684159452445372329974654": {"location": "F:\\Original\\Original - Copy.jpg", "matches": {"12444440202896276154621357450824726824": {"location": "F:\\Original\\Original.jpg", "mse": 0.0}, "281773066927532109367185111816122180082": {"location": "F:\\Variation\\Variation.jpg", "mse": 0.10824074074074073}}}}

This might be related to same bug referenced here. But I'm not sure. https://github.com/elisemercury/Duplicate-Image-Finder/issues/79

Test Img.zip

elisemercury commented 4 months ago

Hi @MarcG2,

Thanks for your question. difPy v4.1.0 has been released and I would recommend testing it on your dataset to see if you can see some improvements. The new version comes with an improved comparison algorithm.

Let me know if the issue should it still persist even though you've updated.

Thanks, Elise