elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate and similar images
https://difpy.readthedocs.io
MIT License
465 stars 67 forks source link

Search results' keys are just names, but sometimes in sub-folders #33

Closed TheLastGimbus closed 2 years ago

TheLastGimbus commented 2 years ago

Hi there! I have a folder like this:

folder/
| - IMG_202201.jpg
| - IMG_202202.jpg
| - subfolder/
|  | - IMG_202203.jpg

and i use it as first arg

i noticed that difPy.dif() search results give me just the file name... without the subfolder anyhow noted :neutral_face:

this broke my script with FileNotFoundError: [Errno 2] No such file or directory

elisemercury commented 2 years ago

Hi @TheLastGimbus, Thanks a lot for your feedback! The keys of the results dictionary represent only the filenames of the images, and do not reflect their path/location. If you want to extract a file's location from the dictionary, I would suggest to extract it from the location key in the nested dictionary.

As an example:

for file in search.result.keys():
    print(f'{file} - {search.result[file]["location"]}')

I hope this is helpful! Let me know if you need any further help. All the best, Elise

TheLastGimbus commented 2 years ago

Then what if i had

folder/
| - same_file1.jpg
| - same_file2.jpg
| - subfolder/
|  | - same_file1.jpg

Will the keys will be the same... ? This would be bad...

elisemercury commented 2 years ago

Hi again @TheLastGimbus,

Indeed, you are right, this would definitely be a scenario we want to avoid.

If I am not mistaken, as of today, if both samples of samefile1.jpg would both have duplicates within the image data, then these would be grouped under the same key in the results dictionary, and the filepath of the same_file1.jpg in the subfolder would be listed in the duplicates of the first samefile1.jpg.

I agree, this is not appropriate and expected behavior, since both samefile1.jpg samples might not necessarily be duplicates.

Thanks a lot for bringing up this issue! I will do some more testing around it, and if the scenario above is indeed the case, I will ASAP bring out a fix with the next release of difPy - I'll keep you posted.

Again, thanks a lot for your input! The more users use difPy, the more issues can be detected and the better the algorithm gets over time.

All the best, Elise

elisemercury commented 2 years ago

Hi @TheLastGimbus, This issue has now been resolved in release v2.4.2. The .result dictionary now contains keys with unique identifiers, and the image filename can now be extracted from the nested dictionary. You can refer to the difPy Usage Documentation for a detailed guide. Again, thanks a lot for your input and all the best, Elise