Closed qtc-de closed 1 year ago
Hi Tobias!
Thank you so much for your feedback and your effort in making difPy better and more efficient! I really appreciate the effort you invested in this. I had a quick look at your changes and they seem very clean at first sight. I also appreciate a lot that you adjusted the README to your changes. Let me go through the code more in detail and test the changes in the next few days. After that, I'll decide whether we can merge and deploy them.
Again, thank you very much for this!
All the best, Elise
Hi Elise :wave:
first of all, cool idea! I recently needed to compare a large chunks of images and your approach for comparing them worked pretty well :+1:
That being said, in the current implementation it is rather slow. Comparing larger chunks of images (15000+) takes a while. Moreover, you use a lot of different dependencies where some of them are quite large (e.g.
opencv
). This makes it difficult to install the tool in specific environments like within a Docker container.Since I probably need to compare images in future again, I thought of improving these issues. This pull request provides the results. Before talking about the changes, let me apologize for the huge pull request. I actually do not like larger pull requests for my own repos and prevent from doing them to other persons as well. However, the dependency changes and especially the multiprocessing required a larger restructuring of your tool. Therefore, I totally understand if you do not want to merge the changes. In this case, I'm fine with maintaining a fork of your repository that provides an alternative implementation. Just decide as you like :)
Here is a brief summary of the changes I made:
numpy
andPillow
. This makes it possible to create a Docker container running difPy that has only161MB
. Before, with opencv, we were around1.2GB
.A
is similar to imageB
, one probably does not want to compareB
to other images, but is fine with only comparingA
with others from here. Sure, this may misses some edge case duplicates, but in most situations it should be fine and provides a huge speedup for the operation.As I said, many changes. Just think about whether you want to merge or whether we keep these changes in a separate fork. I'm fine with both approaches :wink:
Best Tobias