elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
420 stars 65 forks source link

Which files are deleted if comparing and deleting within two folders? (Folder A and Folder B) #55

Closed Kaschi14 closed 1 year ago

Kaschi14 commented 1 year ago

I thought it is always the images in folder at the 2nd argument. But actually it is not deterministic. In rare cases a file in the first folder argument gets deleted. It happened only once within 69 deletions.

See example log for dif("F:\outtakes\outtakes_pos\bboxes\", "F:\1.0.0\experiment_pos\bbox\", recursive=False, delete=True):

... Deleted file: F:\experiment_pos\bbox\89738260-3d58-4045-ba7d-b534ddff2b82_2.png Deleted file: F:\experiment_pos\bbox\7edc9a3e-a9f6-4229-bbfc-62cf97495f4e_2.png Deleted file: F:\outtakes\outtakes_pos\bboxes\77c4d676-050d-43fe-ab9f-13740c77763f_2.png Deleted file: F:\experiment_pos\bbox\f8e88fc0-f5c3-4a1e-a2e8-18a252bad860_2.png Deleted file: F:\experiment_pos\bbox\7f9b605c-159a-4a8d-8999-b0223d7ab7d1_2.png ...

elisemercury commented 1 year ago

Hi @Kaschi14,

Thanks for your question and for opening the issue! There are two cases to looking at the two-folder/multi-folder case:

Case 1 - Duplicate images have different image qualities: in this case, the image with the lowest quality (file size) will be deleted. The folder in which it will be deleted depends on which of the folders it is located in. Therefore, what you are observing might be due to different image qualities.

Case 2 - Duplicate images have the same image qualities i. e. are exact duplicates: in this case, as of the new update to difPy v3.0.0, only the duplicate pairs in the first directory argument should be deleted (as of my testing).

I hope this clarifies! Don't hesitate to let me know if you have further questions.

All the best, Elise