Closed qiminchen closed 4 years ago
@beijbom hey Oscar, as you mentioned at https://github.com/beijbom/pyspacer/pull/17#discussion_r420909318, one way to save more memory is to crop the image first using Image.crop()
from PIL
and convert the patches to Numpy
after. So I did a very simple experiment to trace memory usage. Method for monitoring the memory usage adopted from Option 2: tracemalloc and refer to tracemalloc for more details.
Basically, crop_patches_pil()
in extract_features_utils.py
return a list of cropped PIL Image, if...else...
in extract_features.py
just make the comparison convenient. TestMemoryUsage
in test_extract_features.py
simply test the memory usage of extraction block. These auxiliary functions will be removed if we decide to merge this PR.
tracemalloc
basically tracks the individual memory blocks allocated by the Python interpreter when __call__(self, im, rowcols)
in EfficientNetExtractor()
is completed so I assume this is equivalent to the memory usage measurement.
Result on 08bfc10v7t.png
with 5 annotations. Looks like cropping the image first using Image.crop()
from PIL
and convert the patches to Numpy
after would save more memory than converting to numpy array at the beginning.
-> Initializing EfficientNetExtractor
Numpy: Current memory usage is 0.464665MB; Peak was 4.160963MB
-> Initializing EfficientNetExtractor
PIL: Current memory usage is 0.46915MB; Peak was 2.939177MB
hey @qiminchen . Thanks for looking into this. It's interesting it saves so much memory since you are still converting it to an array in order to do the np.pad call. How much does this save on a 100 mega pixel image?
I'm inclined to stay with the old code since we know that it works (don't fix what ain't broken...). Also, I have had some bad experience with the PIL.convert('RGB') step -- just seem to cause trouble for some of the more unusual image formats.
But let's keep an eye on this fix if we run into memory issues down the road.
@beijbom
How much does this save on a 100 mega pixel image?
I will take a look at this after.
I'm inclined to stay with the old code since we know that it works (don't fix what ain't broken...). Also, I have had some bad experience with the PIL.convert('RGB') step -- just seem to cause trouble for some of the more unusual image formats.
I agree.
@qiminchen : I'm closing this PR for now. We can revisit later.
This is a simple memory usage comparison of
Numpy
andPIL
mentioned at https://github.com/beijbom/pyspacer/pull/17#discussion_r420909318