AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.15k stars 499 forks source link

Memory Error during Feature Extraction (for many images or large mask) #303

Closed michaelschwier closed 2 years ago

michaelschwier commented 7 years ago

When I tried to extract features on many images in a loop using PyRadiomics I ran into a memory error. I was able to reproduce two types of memory errors with a simple script and just one image and mask, repeatedly calling the feature extraction on the same image.

I put the code to reproduce the error here on gist You can find the test image data to run the script (as well as the script file itself again) here

There are two different error cases:

  1. Trying to extract features from a rather large mask (-> "MemoryError" in numpy)
  2. Trying to extract features from small masks for many images in a loop (-> "Failed to allocate memory for image" in SimpleITK)

You can reproduce them by (un)commenting on of the lines defining which mask to use (see also the top explanatory comment in the script).

Case 1 error details:

Traceback (most recent call last):
  File "d:/Test/PyRadiomicsMemoryExceptionTest.py", line 26, in <module>
    featureVector = extractor.execute(testImage, testMask, label = 1)
  File "C:\Python36\lib\site-packages\pyradiomics-1.2.0.post25.dev0+g13274ff-py3.6-win32.egg\radiomics\featureextractor.py", line 354, in execute
    shapeClass = self.featureClasses['shape'](croppedImage, croppedMask, **self.settings)
  File "C:\Python36\lib\site-packages\pyradiomics-1.2.0.post25.dev0+g13274ff-py3.6-win32.egg\radiomics\shape.py", line 62, in __init__
    physicalCoordinates -= numpy.mean(physicalCoordinates, axis=0)  # Centered at 0
  File "C:\Python36\lib\site-packages\numpy\core\fromnumeric.py", line 2909, in mean
    out=out, **kwargs)
  File "C:\Python36\lib\site-packages\numpy\core\_methods.py", line 54, in _mean
    arr = asanyarray(a)
  File "C:\Python36\lib\site-packages\numpy\core\numeric.py", line 583, in asanyarray
    return array(a, dtype, copy=False, order=order, subok=True)
MemoryError

Case 2 error details:

Traceback (most recent call last):
  File "d:/Test/PyRadiomicsMemoryExceptionTest.py", line 27, in <module>
    featureVector = extractor.execute(testImage, testMask, label = 1)
  File "C:\Python36\lib\site-packages\pyradiomics-1.2.0.post25.dev0+g13274ff-py3.6-win32.egg\radiomics\featureextractor.py", line 346, in execute
    featureVector.update(self.getProvenance(imageFilepath, maskFilepath, mask))
  File "C:\Python36\lib\site-packages\pyradiomics-1.2.0.post25.dev0+g13274ff-py3.6-win32.egg\radiomics\featureextractor.py", line 440, in getProvenance
    for k, v in six.iteritems(generalinfoClass.execute()):
  File "C:\Python36\lib\site-packages\pyradiomics-1.2.0.post25.dev0+g13274ff-py3.6-win32.egg\radiomics\generalinfo.py", line 56, in execute
    generalInfo[el] = getattr(self, 'get%sValue' % el)()
  File "C:\Python36\lib\site-packages\pyradiomics-1.2.0.post25.dev0+g13274ff-py3.6-win32.egg\radiomics\generalinfo.py", line 139, in getVolumeNumValue
    ccif.Execute(labelMap)
  File "C:\Python36\lib\site-packages\SimpleITK\SimpleITK.py", line 20584, in Execute
    return _SimpleITK.ConnectedComponentImageFilter_Execute(self, *args)
RuntimeError: Exception thrown in SimpleITK ConnectedComponentImageFilter_Execute: c:\d\vs14-win32-pkg\simpleitk-build\itk-prefix\include\itk-4.11\itkImportImageContainer.hxx:199:
Failed to allocate memory for image.
fedorov commented 7 years ago

@JoostJM let us know if you have any idea what is going on, or if we should investigate further.

JoostJM commented 7 years ago

@michaelschwier, what kind of hardware are you using? Specifically, how much RAM did you have available when running the script? I tested your script on my computer (Intel Xeon E3-1241, 16 GB RAM) and had no issues with your large mask (ran 11 iterations, memory fluctuates, max need about 2 GB). I'm also running the small mask script (~600 iterations now, still requires only about 200 MB of RAM), but this also appears to be running fine.

It is possible there isn't enough RAM available to run pyradiomics. We already incorporate enhancements to reduce the memory footprint when extracting features, such as cropping on the bounding box of the segmentation prior to feature extraction. However, generating texture matrices, especially when using large masks, simply requires a lot of memory. To further check, I will run a memory profiling over time of your script.

Your small mask poses an interesting case though. It is possible that this crashes because later because the RAM is enough for the first few iterations, but runs out when the results vector grows too large (even though this vector is relatively small compared to the overall memory usage of pyradiomics.

As to solutions. If it already fails on the first connected component image filter (as is the case in your large mask case), I'm not sure what to do. You can remove that part of the code by disabling the additional info (parameter additionalInfo set to False), but I think it will fail in some other part of the code (the most heavily memory intensive functions are the generation of texture matrices). For your small mask case, check out the batch script contained in the examples. This script writes out the results of each case (by appending to a file), thereby preventing a build-up of memory usage when extracting a large batch. In theory, the batch script should only fail due to a memory shortage if the any one case is too large to extract (regardless of how many cases were extracted before).

JoostJM commented 7 years ago

Here are the graphs of memory usage over time for the large (~20 iterations) and small (~200 iterations). I got no memory errors and halted the process.

Large mask large mask memory usage

Small Mask small mask memory usage

If you still get your memory errors, could you make a similar graph? I used a simple python package called mprof (pip install memory_profiler) and then ran your script using python C:\Python27\scripts\mprof run PyRadiomicsMemoryExceptionTest.py. After this has finished you can see the graph by running python C:\Python27\scripts\mprof plot

michaelschwier commented 7 years ago

@JoostJM Thank you for your answer and checking on your machine. My System is Win 10 with 16GB RAM. During all my tests there were always at least 7GB RAM still available. Unfortunately the mprof tool doesn't work for me (Windows) it throws an exception that it cannot access the source code!?

So I did "manual" memory observation by looking at the memory consumption of the process in the Task Manager. For the case with the large mask the process crashes when using around 1.4 GB of memory. For the case with the small mask the process never consumes more than 350 MB of memory.

However: I was using a 32 bit Python. So in the large-mask case I can understand that it runs our of memory (though it should still have some headroom at 1.4). For the small-mask case it shouldn't be an issue, though :/

I now also installed a 64 bit Python in parallel and i cannot reproduce the errors up to now (> 1200 iterations on the small mask). So for me that could be the solution. Nevertheless the crash of the small mask on 32 bit still puzzles me ...

pieper commented 7 years ago

We should think about explicitly not supporting 32bit python. We explicitly don't support 32 bit platforms in Slicer because we had so many memory related errors.

fedorov commented 7 years ago

I would suggest not spend any time on this issue debugging it in 32 bit, and add a note to the user guide that 64 bit python should be used.

michaelschwier commented 7 years ago

Could maybe even add a warning during installation of pyradiomics when detecting 32bit python!?

fedorov commented 7 years ago

Could maybe even add a warning during installation of pyradiomics when detecting 32bit python!?

Definitely, or even failure.

CristianIzquierdoLitii commented 6 years ago

Hi,

I've been trying to run some test.py to check radiomics is working but it keep saying that it cannot import featureextractor from radiomics. Is there anyone having the same issue?

Thanks in advance

Yukti-09 commented 5 years ago

I have a similar issue. I am facing a memory error when I try to run my model. I am sending 6500 images to train with 7 captions each. I am using Ubuntu.

JoostJM commented 5 years ago

@Yukti-09, which specific version of PyRadiomics are you using? what parameters? How much RAM does your system have?

Yukti-09 commented 5 years ago

Not using pyradiomics

ReemaParekh commented 4 years ago

Hi! I am using pyradiomics for 2D ultrasound image for feature extraction. I am using an open access database of thyroid ultrasound images (Available: thyroid http://cimalab.intec.co/applications/thyroid/ ). When I am using pyradiomics for feature extraction from mask it requires more than 16 GB RAM. Are there any settings required to process pyradiomics to limit the memory usage? Mask is small in compare to the whole image. If features extraction from mask is taking these much memory then what will happen if I will do the same for whole image? Kindly guide.

JoostJM commented 4 years ago

@ReemaParekh what kind of settings are you using when performing the extraction?

ReemaParekh commented 4 years ago

I have 360x560 size USG image and using voxel based features extraction which is applied on whole image and all feature class are on. In this case the memory usage reached 20 GB in some cases. I am using below settings for the same. featureVector = extractor.execute(image3D, mask,label=1,voxelBased=True) settings = {} settings['binWidth'] = 25

settings['force2D'] = True

settings['force2Ddimension'] = 0 settings['maskedKernel']=True settings['initvalue']=5 settings['kernelRadius']=10 settings['resampledPixelSpacing'] = None # [3,3,3] is an example for defining resampling (voxels with size 3x3x3mm) settings['interpolator'] = sitk.sitkBSpline settings['verbose'] = True

Further to this, if I write force2D= true then my code is not working. Else it works but require huge memory. I have 2D image, but unable to use directly 2D image in pyradiomics program so I have converted 2D image into 1x360x560 to process. image3D=sitk.JoinSeries(image)

JoostJM commented 4 years ago

The memory requirement sounds valid. Be aware that voxel-based radiomics can be quite memory intensive, especially when extracting the entire image and enabling all features. Output is float64 maps for each feature, which in your case means 360x560x8 bytes per feature map. Furthermore, there is some additional memory requirement for intermediate feature maps, mask etc.

As to the force2D, what do you mean with "not working"? PyRadiomics should be able to deal with both truly 2D input, as when you enable force2D. The only thing I can imagine going wrong in your code is that if your image has sitk size 1x360x560, it means that x is your force2D dimension, and you should set force2Ddimension to 2 (reason from the matrix, which is ordered as z, y, x).

Mojzaar commented 3 years ago

I also have a memory issue with my data. I have a 217x217x217 float32 image which has [0.1,0.1,0.1] voxel size (0.1 pixel size and 0.1 slice thickness). So, I only change the "resampledPixelSpacing" to be [0.1,0.1,0.1] and keep the rest as default values. When I pass this file to calculate the morphological shape radiomics, it returns the following error:

`self.generalInfo[self.generalInfo_prefix + 'Image-' + prefix + '_Spacing'] = image.GetSpacing()
    self.generalInfo[self.generalInfo_prefix + 'Image-' + prefix + '_Size'] = image.GetSize()
---> im_arr = sitk.GetArrayFromImage(image).astype('float')
     self.generalInfo[self.generalInfo_prefix + 'Image-' + prefix + '_Mean'] = numpy.mean(im_arr)
     self.generalInfo[self.generalInfo_prefix + 'Image-' + prefix + '_Minimum'] = numpy.min(im_arr)

MemoryError: Unable to allocate 74.2 GiB for an array with shape (2151, 2151, 2151) and data type float64`

I assumed that a lesion with a 21.7 mm diameter is not a big image. Would you please advise me if I am doing something wrong? I am using Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz with 64 GB memory.

JoostJM commented 3 years ago

From what I can read in your stacktrace, it looks like the input image was 217mmx217mmx217mm, which results in an image with 2151x2151x2151. This means it has 9,952,248,951 float64 datapoints, which does require 74Gb. This is not a bug in PyRadiomics, but simply an image that is to big to fit into your RAM.

JoostJM commented 3 years ago

This error happens before any resampling or cropping. In the future, I may be able to rewrite the code to prevent such an error.

In the meantime, you can disable the part of the code that produces this error by passing additionalInfo=False in you configuration. This disables the computation of the diagnostic features (where the error happens), but does not affect the features extracted. This allows the code to proceed to image resampling, where the image is cropped onto your lesion, requiring much less memory.

kritsini21 commented 11 months ago

hi ! i have an issue related to feature extraction as well...when im trying to extract radiomics features from labelmaps, unfortunately i dont get them for all of my labels. I have checked the permission issues, the type of the labelmaps etc to be consistent. Any idea on what could be the issue? I use pyradiomics