AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.16k stars 500 forks source link

[BUG] Process terminated while extracting features from filtered image #579

Closed TimZaragori closed 3 years ago

TimZaragori commented 4 years ago

Describe the bug I am trying to extract features using different filters from 3D PET images with 3D ROIs. With the wavelet and LBP3D filters the process terminated with the following code : Process finished with exit code -1073740940 (0xC0000374). Usually I see this kind of errors when I haven't enough memory but I tracked my memory and 15GB were still free when the process terminated.

I also have the following error with LoG filter (sigma = [0.5, 1, 2, 3, 5]) but it might more be a wrongful use than a bug : IndexError: boolean index did not match indexed array along dimension 1; dimension is 1 but corresponding boolean dimension is 2 and Process finished with exit code -1073741819 (0xC0000005)

PyRadiomics configuration

radiomics.imageoperations.getBinEdges = IBSI_binning # from issue 498
im_spacing = np.array([ROI.GetSpacing()[0]] * 3)
image, ROI = IBSI_resampling(image, ROI, resampledPixelSpacing=im_spacing, 
                                                 interpolator=sitk.sitkBSpline) # from issue 498
label_stats.Execute(image, ROI)
settings = {'resegmentRange': [0., label_stats.GetMaximum(label)], 'resegmentMode': 'absolute',
                  'binWidth': binSize, 'weightingNorm': 'no_weighting'}
extractor = featureextractor.RadiomicsFeatureExtractor(**settings)
imagetype_dic = {'Original': None,
                             'Wavelet': None,
                             'LoG': {'sigma': [0.5, 1, 2, 3, 5]},
                             'Square': None,
                             'SquareRoot': None,
                             'Logarithm': None,
                             'Exponential': None,
                             'Gradient': None,
                             'LBP3D': None}
for imagetype in imagetype_dic:
    extractor.enableImageTypeByName(imagetype, customArgs=imagetype_dic[imagetype])
results_dic = extractor.execute(image, ROI)

PyRadiomics log file testLog_wavelet.txt testLog_Log.txt testLog_LBP3D.txt

Version (please complete the following information):

Thanks for you help and feel free to ask for more information if needed

JoostJM commented 4 years ago

@TimZaragori, thanks for reporting this, I will investigate. The return code and error you are describing suggests it is failing somewhere in the C-extensions of PyRadiomics (causing a process termination instead of gracefull raising a RuntimeError.

What happens when you pass the image and ROI to the regular PyRadiomics (i.e. without using the IBSI drop-ins?)

JoostJM commented 4 years ago

And what happens if you use the IBSI-resampling drop in as a drop in, without applying it outside of the feature extractor? (i.e. setting radiomics.imageoperations.resampleImage = IBSI_resampling, following #498)

TimZaragori commented 4 years ago

@JoostJM By commenting the first line, so without using IBSI_binning, I don't get error anymore. When I apply the resampling as a setting of the extractor it doesn't seem to change the behavior (if IBSI_binning is on I got errors, else it's ok)

JoostJM commented 4 years ago

@TimZaragori, thanks for the details! I'll get on it. Can you share the image/mask that gives you the error for me to play with?

TimZaragori commented 4 years ago

Can you send me an email (timothee.zaragori@univ-lorraine.fr) ? Since we are dealing with patient data, prefer to share it in this way

JoostJM commented 4 years ago

@TimZaragori, I found the cause of the problem. In IBSI binning a fixed lower bound for the bin edges is used if you define an absolute resegmentation range. For non-filtered images, this poses no problem, as voxels outside that range are excluded anyway. However, applying a filter results in different gray values, and resegmentation is not re-applied to these filtered images. Therefore, applying some filters can lead to voxels falling outside the defined range of bins, causing them to be discretized to "0". This is an invalid value for gray level and causes an index error in the calculation of texture matrices (e.g. for GLSZM, correct index in matrix is calculated as (i, j) = (grayvalue - 1, region size), which in case of gray value 0 results in index -1, which is invalid).

In my case, this resulted in a graceful error where an IndexError is raised, but this was most likely because the datatype for the index is size_t, which does not allow for negative indices and therefore results in an overflow, which on the subsequent check is then indeed larger than the maximum allowed index (which is checked). In your case, I assume size_t datatype does allow for negative values and therefore passes the maximum index check and subsequently fails with a Process terminated.

I will add a bugfix to ensure grayvalue 0 fails gracefully in matrix calculation. For your specific case, I added a little check in the IBSI binning function to prevent this error (it logs a warning when this fix kicks in). Below is a python script which mimics the PyRadiomics command line, but uses IBSI binning and resampling instead of the corresponding regular PyRadiomics functions. You can use this script just like you would use the PyRadiomics command line, i.e. python <script.py> <image.nrrd> <mask.nrrd> -p <params.yaml> instead of pyradiomics <image.nrrd> <mask.nrrd> -p <params.yaml>:

#!/usr/bin/env python
import logging

import numpy as np
import SimpleITK as sitk
import six

import radiomics
from radiomics.scripts import parse_args

def IBSI_binning(parameterValues, **kwargs):
  ibsiLogger = logging.getLogger('radiomics.ibsi')

  binWidth = kwargs.get('binWidth', 25)
  binCount = kwargs.get('binCount')
  resegmentRange = kwargs.get('resegmentRange')
  resegmentMode = kwargs.get('resegmentMode', 'absolute')

  if binCount is not None:
    binEdges = np.histogram(parameterValues, binCount)[1]
    binEdges[-1] += 1  # Ensures that the maximum value is included in the topmost bin when using numpy.digitize
  else:
    minimum = min(parameterValues)
    maximum = max(parameterValues)

    # Start binning form the first value lesser than or equal to the minimum value and evenly dividable by binwidth
    lowBound = minimum - (minimum % binWidth)
    # Add + 2* binwidth to ensure the maximum value is included in the range generated by numpy.arange, and that values
    # equal to highbound are binned into a separate bin by numpy.histogram (This ensures ALL bins are half open, as
    # numpy.histogram treats the last bin as a closed interval. Moreover, this ensures consistency with numpy.digitize,
    # which will assign len(bins) + 1 to values equal to rightmost bin edge, treating all bins as half-open)
    highBound = maximum + 2 * binWidth

    # #####################################
    # IBSI difference
    # #####################################
    if resegmentRange is not None:
      if resegmentMode == 'absolute':
        resegment_min = min(resegmentRange)
        if resegment_min > lowBound:
            ibsiLogger.warning('lower bound of resegment range (%-.3g) lower than found minimum (%-.3g), using found minimum', resegment_min, lowBound)
        else:
            lowBound = min(resegmentRange)
      elif resegmentMode == 'sigma':
        lowBound = minimum
    # #####################################

    binEdges = np.arange(lowBound, highBound, binWidth)

    # if min(parameterValues) % binWidth = 0 and min(parameterValues) = max(parameterValues), binEdges will only contain
    # 1 value. If this is the case (flat region) ensure that numpy.histogram creates 1 bin (requires 2 edges). For
    # numpy.histogram, a binCount (1) would also suffice, however, this is not accepted by numpy.digitize, which also uses
    # binEdges calculated by this function.
    if len(binEdges) == 1:  # Flat region, ensure that there is 1 bin
      binEdges = [binEdges[0] - .5, binEdges[0] + .5]  # Simulates binEdges returned by numpy.histogram if bins = 1

    ibsiLogger.debug('Calculated %d bins for bin width %g with edges: %s)', len(binEdges) - 1, binWidth, binEdges)

  return binEdges  # numpy.histogram(parameterValues, bins=binedges)

def IBSI_resampling(image, mask, **kwargs):
  ibsiLogger = logging.getLogger('radiomics.ibsi')
  # resample image to new spacing, align centers of both resampling grids.
  spacing = kwargs.get('resampledPixelSpacing')
  grayValuePrecision = kwargs.get('grayValuePrecision')
  interpolator = kwargs.get('interpolator', sitk.sitkLinear)

  try:
    if isinstance(interpolator, six.string_types):
      interpolator = getattr(sitk, interpolator)
  except Exception:
    ibsiLogger.warning('interpolator "%s" not recognized, using sitkLinear', interpolator)
    interpolator = sitk.sitkLinear

  im_spacing = np.array(image.GetSpacing(), dtype='float')
  im_size = np.array(image.GetSize(), dtype='float')

  spacing = np.where(np.array(spacing) == 0, im_spacing, spacing)

  spacingRatio = im_spacing / spacing
  newSize = np.ceil(im_size * spacingRatio)

  # Calculate center in real-world coordinates
  im_center = image.TransformContinuousIndexToPhysicalPoint((im_size - 1) / 2)

  new_origin = tuple(np.array(image.GetOrigin()) + 0.5 * ((im_size - 1) * im_spacing - (newSize - 1) * spacing))

  ibsiLogger.info('Resampling from %s to %s (size %s to %s), aligning Centers', im_spacing, spacing, im_size, newSize)

  rif = sitk.ResampleImageFilter()
  rif.SetOutputOrigin(new_origin)
  rif.SetSize(np.array(newSize, dtype='int').tolist())
  rif.SetOutputDirection(image.GetDirection())
  rif.SetOutputSpacing(spacing)

  rif.SetOutputPixelType(sitk.sitkFloat32)
  rif.SetInterpolator(interpolator)
  res_im = rif.Execute(sitk.Cast(image, sitk.sitkFloat32))

  # Round to n decimals (0 = to nearest integer)
  if grayValuePrecision is not None:
    ibsiLogger.debug('Rounding Image Gray values to %d decimals', grayValuePrecision)
    im_arr = sitk.GetArrayFromImage(res_im)
    im_arr = np.round(im_arr, grayValuePrecision)
    round_im = sitk.GetImageFromArray(im_arr)
    round_im.CopyInformation(res_im)
    res_im = round_im

  # Sanity check: Compare Centers!
  new_center = res_im.TransformContinuousIndexToPhysicalPoint((newSize - 1) / 2)
  ibsiLogger.debug("diff centers: %s" % np.abs(np.array(im_center) - np.array(new_center)))

  rif.SetOutputPixelType(sitk.sitkFloat32)
  rif.SetInterpolator(sitk.sitkLinear)
  res_ma = rif.Execute(sitk.Cast(mask, sitk.sitkFloat32))
  res_ma = sitk.BinaryThreshold(res_ma, lowerThreshold=0.5)

  return res_im, res_ma

if __name__ == "__main__":
  radiomics.imageoperations.getBinEdges = IBSI_binning
  radiomics.imageoperations.resampleImage = IBSI_resampling
  parse_args()
TimZaragori commented 4 years ago

@JoostJM Thanks for investigating this and the detailed answer. Indeed, it seems quite logic we can't keep the same resegmentation range as the non-filtered images. I should have thought about it sorry. However, I don't see how we can keep the same behavior with filtered image, i.e. the same lower bound for every patients since (as far as I understood) it will allow similar values across different patients to go in the same bin. Or for filtered images, it is not the behavior we want and Fixed bin number is more interesting? If I set the lower bound to the minimum in the ROI, this minimum will be patient dependent. An option could be to locate a voxel with the minimum we want on non-filtered image (in my case where I have PET images it will be 0 and I can get it in the background) and then look at the same voxel in filtered image to get its value. Even though I am not used to all these filters and their mathematical definitions I am pretty sure it won't work for all filters, e.g. wavelet filters for example. It will work for Square, Square Root, Logarithm and Exponential. Do you think it is possible to use the mathematical definitions of these filters and prior on the images used (possible range of values in the voxels according to type of image) to find an appropriate minimum in each case ?

JoostJM commented 4 years ago

Hi @TimZaragori, that's a very good question. The implementation/solution is not very straightforward, as it is more on the interpretation of what constitutes good practice in radiomic feature extraction. While there is something to be said for have more or less comparable bin ranges between patients, stuff like filters tends to make it a more difficult problem.

Even looking at the minimum value will not help, as it is no guarantee that this will also be the minimum intensity in the filter (even in a simple such as "absolute" filter). This is also why PyRadiomics doesn't have this fixed range implemented in the master. What PyRadiomics does do is ensure that the bins are aligned with 0 (I.e. the modulus of the edges will be 0). This enforces at least some similarity in the bins.

What is also important to consider here is that the discretized images are used in features that are less concerned in magnitude of the intensity and more in contrast between (neighboring) voxels. In the latter case it doesn't matter if the same intensities are discretized to different bins between patients, so long as the difference between 2 bins retains the same meaning/contrast (hence why we advise fixed bin width). In fact, many GLCM features are even insensitive to the offset of the discretized gray values (the only change you'll effectively get with the fixed bin range).

Anyway, those are my current views on the subject, though I wholeheartedly agree that this is an interesting topic which should be discussed in length. It may be a good idea to also brach this subject on IBSI's forum on Google groups.