numpy vs pil simple memory usage

qiminchen commented 4 years ago

This is a simple memory usage comparison of Numpy and PIL mentioned at https://github.com/beijbom/pyspacer/pull/17#discussion_r420909318

qiminchen commented 4 years ago

@beijbom hey Oscar, as you mentioned at https://github.com/beijbom/pyspacer/pull/17#discussion_r420909318, one way to save more memory is to crop the image first using Image.crop() from PIL and convert the patches to Numpy after. So I did a very simple experiment to trace memory usage. Method for monitoring the memory usage adopted from Option 2: tracemalloc and refer to tracemalloc for more details.

Basically, crop_patches_pil() in extract_features_utils.py return a list of cropped PIL Image, if...else... in extract_features.py just make the comparison convenient. TestMemoryUsage in test_extract_features.py simply test the memory usage of extraction block. These auxiliary functions will be removed if we decide to merge this PR.

tracemalloc basically tracks the individual memory blocks allocated by the Python interpreter when __call__(self, im, rowcols) in EfficientNetExtractor() is completed so I assume this is equivalent to the memory usage measurement.

Result on 08bfc10v7t.png with 5 annotations. Looks like cropping the image first using Image.crop() from PIL and convert the patches to Numpy after would save more memory than converting to numpy array at the beginning.

-> Initializing EfficientNetExtractor
Numpy: Current memory usage is 0.464665MB; Peak was 4.160963MB
-> Initializing EfficientNetExtractor
PIL: Current memory usage is 0.46915MB; Peak was 2.939177MB

beijbom commented 4 years ago

hey @qiminchen . Thanks for looking into this. It's interesting it saves so much memory since you are still converting it to an array in order to do the np.pad call. How much does this save on a 100 mega pixel image?

I'm inclined to stay with the old code since we know that it works (don't fix what ain't broken...). Also, I have had some bad experience with the PIL.convert('RGB') step -- just seem to cause trouble for some of the more unusual image formats.

But let's keep an eye on this fix if we run into memory issues down the road.

qiminchen commented 4 years ago

@beijbom

How much does this save on a 100 mega pixel image?

I will take a look at this after.

I'm inclined to stay with the old code since we know that it works (don't fix what ain't broken...). Also, I have had some bad experience with the PIL.convert('RGB') step -- just seem to cause trouble for some of the more unusual image formats.

I agree.

beijbom commented 4 years ago

@qiminchen : I'm closing this PR for now. We can revisit later.

coralnet / pyspacer

numpy vs pil simple memory usage #24