JohannesBuchner / imagehash

A Python Perceptual Image Hashing Module
BSD 2-Clause "Simplified" License
3.28k stars 331 forks source link

find_similar_images: Release memory as soon as we're done with an image. #153

Closed nh2 closed 3 years ago

nh2 commented 3 years ago

Fixes memory usage blowup.

coveralls commented 3 years ago

Coverage Status

Coverage increased (+0.04%) to 89.535% when pulling 40d75333374e7e022fcc922abbe8dc4fe5c5f027 on nh2:similar-images-prompt-image-deallocation into 0abd4878bdb3c2b7bd0a5ec58d1ffca530e70cec on JohannesBuchner:master.

JohannesBuchner commented 3 years ago

Please describe in detail the problem you had, and what the error looked like.

The first line does not seem to be needed?

nh2 commented 3 years ago

Please describe in detail the problem you had, and what the error looked like.

Using the with open(...) pattern is fundamental to correct resource usage in Python. Otherwise, the resources in question (such as file descriptors, or RAM given that images can be very large) are only freed upon the next garbage collection, as opposed to promptly when they are no longer needed. Wherever scoped file access is done (which is almost always), with should be used.

For example: https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes

Same for Pillow: https://pillow.readthedocs.io/en/stable/reference/open_files.html

The first line does not seem to be needed?

That's true. I often scope variables explicitly because it is (arguably) confusing that in Python, blocks like with, for or try don't create a scope, so it isn't obvious whether a variable is in scope at all. But you are right that it is not needed. I removed it now.

JohannesBuchner commented 3 years ago

If I understand correctly, no memory error was produced, and resource overuse is prevented by garbage collection.

The goal of the script is to demonstrate usage of the library to users who want to build a complex framework using it. For that purpose, the script is intended to be as short and easy to read as possible. These incentives are different from programs one should write for production.

nh2 commented 3 years ago

While I agree examples should be small, they should also be correct. People will copy-paste such examples, and then keep this suboptimal usage of Pillow in that will make their programs use more memory than necessary.

That's why using the recommended patterns from the Python tutorial and the PIL tutorial which prevent this makes sense to me.