elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate and similar images
https://difpy.readthedocs.io
MIT License
466 stars 67 forks source link

Match Single Image with Read-Only Directory #6

Closed ashish128 closed 2 years ago

ashish128 commented 2 years ago

Dear Developer,

Am a noob but still love programming (have just started) so excuse me if anything below is "obvious" or "incorrectly stated".

I got the gist that this will match all files in the given directory for similarity.

First Point: Is it possible to match an image (file path to pass as parameter) against a directory path (folder path to pass as parameter)? Which Means that instead of Matching all Images against all images, we could match just one image against all images of a folder.

Second Point: Is the function writing something in the Search folder (like tensor Data or anything)? Am asking to understand if this can work in read-only directory or not. (I tried reading the code but could not figure it out)

Third point: If we have to run / call it multiple times on a large folder then would it be taking long time analyzing all files each time or is it possible to provide / pass a path to file / folder where it can save the analysis to save the time?

Example: (No text in below lines is crossed so please do not ignore if any text is coming crossed. I could not figure out why is it applying this formatting")

Input_file_path = "~/Downloads/image.jpg" # Any valid Image File Target_Folder_path = "~/A_Readonly_Folder_of_Images" # A Read-only folder with say 56,000 (big number ?) files to search from. Working_File_or_Folder_path: "~/A_File_or_Folder_with_Read_Write_Access" # A Write access enabled file / folder to save analysis data to / from. E.g. If the passed parameter file / folder does not exist then create one and save analysis data. If the passed parameter file / folder does exist then read it and use it instead of analyzing the Target Folder again

calling

dif.compare_image(Input_file_path,Target_folder_path,Working_Folder_path)

Please excuse me if am crossing any limits here. I just became curious about this wonderful concept but I know nothing about github and how it works.

Best Regards Ashish

snoozesecurity commented 2 years ago

Agree with the first point. A use case for me is:

Directory1: Thousands of non-duplicate images Directory2: Single, or few images that may be a duplicate of the contents in Directory1

Right now I have to move the single or few possible duplicates from Directory2 into Directory1 and run difPy. The problem is, of course, that difPy compares all of the known-unique images in Directory1 with the other known unique images, thus wasting computation cycles. Would love to see this functionality added!

elisemercury commented 2 years ago

Dear Ashish,

Thank you for your input! Please find below a few comments from my side:

First Point: unfortunately, as of v2.0 it is not possible to pass the location of one specific file as a parameter to the function. For the moment, only folder paths are supported, meaning your single image must be located in a folder itself. This is a feature that will be considered for future updates, thank you!

Second Point: the function does not write or save any data into the folders, therefore it will also run on read-only directories. I also tested this myself and can confirm.

Third point: unfortunately, as of v2.0 difPy does not provide any option to store the computed data and reuse it ls. This is a feature that might be considered for future updates, thank you!

All the best, Elise

elisemercury commented 2 years ago

Dear snoozesecurity, This issue has been addressed in the new version v2.0 of the difPy. Thanks a lot for your input!