elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
448 stars 67 forks source link

Clustering Possible? #51

Closed AhmedThahir closed 1 year ago

AhmedThahir commented 1 year ago

Is it possible to group similar images by extending the code in this library?

elisemercury commented 1 year ago

Hi @AhmedThahir, In theory, everything is possible 😄 It depends what you are looking for exactly. If you can provide me with a concrete example of the output you wish to receive from difPy, I will be happy to look into it.

What would you consider similar images? How similar should they be? There are various ways images can be similar. You can try to make use of the similarity parameter and set it to a customized MSE value so that not only duplicate images will be found, but also similar ones. But again, you will need to define the similarity grade manually . Also, as of the latest difPy version, it will search for images that have an MSE equal to or lower than similarity, and will therefore include images with a lower MSE grade as well, meaning it might include exact duplicates.

If this is not what you are looking for, feel free to elaborate on what you search for and I can look into it. Otherwise, you can of course always fork the project and work on an extension yourself.

Again thanks and all the best, Elise