elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
421 stars 65 forks source link

Multi-processing #41

Closed thecodingchicken closed 10 months ago

thecodingchicken commented 1 year ago

I am currently working on making this project multithreaded, as I have many folders with tens of thousands of images(perhaps 100k+), and am wanting a slightly faster option.

Opening this as a means of communication. If you have a discord account/email that would work better, as I will likely see that before a github issue comment.
My discord account is thecodingchicken#4835 if you would prefer to reach out there.

elisemercury commented 1 year ago

Hi @thecodingchicken, Thank you very much for reaching out and for sharing your feedback! We already received a few requests similar to yours, and therefore decided to implement batch processing in the next upcoming release of difPy. Therefore, with this there will also be an option to multi-thread the process. This release is in ongoing development, and there is no confirmed release date yet. Please therefore, be patient and soon processing on difPy will become a lot more flexible, and scalable. :-) Again thanks and all the best, Elise

thecodingchicken commented 1 year ago

Just an update. Code is almost working, just have to sort out a few minor issues that break everything. Seems to be the norm for me, as I haven't really done any coding in years.
Glad to hear that you are working on it as well. Is it multithreaded in another language? As I know the GIL prevents you from running many threads that are CPU-bound. Hence why multiprocessing seems to be the only option for me, assuming that you stick in CPython, as compared to IronPython or Jython

elisemercury commented 1 year ago

Hi @thecodingchicken, Great to hear you are working on it as well and that you're almost done! Congrats. Feel free to open a pull request when you're done - I would love to see what approach you took. I am working on it as well yes, but as I have currently quite a few other things I am working on, it will still take a bit to be finalized. I am planning to use the Python multiprocessing library, as well as implement the ability to input images to difPy in batches, and save the process state so that it can be resumed if interrupted. All the best and happy coding! Elise

elisemercury commented 10 months ago

Hi @thecodingchicken,

I'm happy to let you know that difPy v4 now comes with multiprocessing. Thanks for your suggestion!

All the best, Elise