AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.15k stars 499 forks source link

Comparison with IBSI features on phantoms #498

Open samulic opened 5 years ago

samulic commented 5 years ago

Hello, I'm doing a project on radiomics and I've been trying to compare the results of the features calculated with pyradiomics with the ones reported in the IBSI guidelines on their CT phantom data so that I can be sure I am using this great tool in the proper way. However after a few weeks reading the documentation and various questions already asked here, I'm still having troubles on understanding why my settings cannot reproduce the same result as on the IBSI guidelines.

Here is the code and the configurations I am using for settings C, D and E only. Some settings are unnecessary since they take the same default value, but are there for better clarification. NotebookMarkdown NotebookPDF

The results are slightly different from the values reported by the IBSI, which can be found here, starting from the shape features (!?) all the way to the ngtdm features of course. What is driving me crazy is that with the digital phantoms that don't require preprocessing I can get almost every feature matching with the IBSI. Is there anyone who could kindly help me clarifying why these preprocessing configurations are not matching with the IBSI? Here is the spreadsheet with the results from the code above.

Thank you! Gabriele

JoostJM commented 5 years ago

Hello Gabrielle,

We've investigated this previously (PyRadiomics is also part of IBSI). There are differences between the IBSI benchmarks and PyRadiomics results, but these differences have been tracked down. Most notably are the differences in gray value discretization (just for the fixed bin size type) and resampling. These differences cannot be corrected by customization settings alone and require replacement by custom functions:

  1. Binning: When performing gray value discretization using a fixed bin width (AKA IBSI: FBS, Fixed Bins Size), and with Resegmentation set (most notable Case A and C), IBSI computes the binedges as equally spaced from the minimum of the resegmentation range if absolute-resegmentation, and min(intensities) when sigma-resegmentation. In PyRadiomics, gray value discretization using a fixed bin width always utilizes binedges which are equally spaced from 0 and where the lowest gray level is ensured to be discretized to the first bin. Regardless of any resegmentation, etc.
  2. Resampling:
    1. Alignment of the grid: In IBSI, resampling grid is aligned by aligning the center of the image, whereas in Pyradiomics, we align the corner of the origin voxel. This can result in slightly different interpolated results, and even slightly different sizes of the resampled image and ROI, which in turn causes differences in extracted feature values.
    2. Gray value rounding: In IBSI, they reason that if the original intensity values are from some lower precision datatype, resampled values (which are floating point numbers, usually 64-bit) should be resampled to a similar resolution. In the case of the IBSI phantoms, resampling to the nearest integer. PyRadiomics does not implement this, as differences are likely to be small and therefore serve more to add complexity than increase the meaning of the extracted values. Especially considering gray values are discretized anyway prior to calculation of most (all except firstorder) features. If some kind of normalization is performed, then meaning of the gray values changes as well. Differences arise here, because small rounding differences can cause a voxel to be assigned to a different bin, which can be a marked change in feature value results, especially in small ROIs.
    3. Mask resampling: In IBSI, different interpolators can also be selected for resampling of the mask, with an additional thresholding the retrieve the binary mask. This only works if the mask is limited to zero and non-zero (i.e. 1) values. PyRadiomics also supports masks with different value labels, allowing extraction from different ROIs in the same mask file by indicating different label values. To prevent any incorrect re-assignment, PyRadiomics forces the mask resampling to be nearest neighbor.

Next, there are also some differences which can be solved by custom settings, in this case as only applies to case E, where both absolute AND sigma resegmentation are performed. In PyRadiomics, both types are implemented, but only 1 can be selected at a time. To simulate applying both types, I calculated the absolute range after resegmentation and used that as the absolute resegment range: [ -718, 400 ]

Finally, there is a difference between PyRadiomics and IBSI in the calculation of firstorder: Kurtosis. IBSI calculates Excess Kurtosis, which is equal to Kurtosis - 3. PyRadiomics calculates Kurtosis, which is always +3 compared to IBSI. The difference of 3 stems from the fact that a gaussian distribution has a kurtosis of 3.

So finalizing, the cause of the difference between PyRadiomics results and IBSI benchmark, per case:

JoostJM commented 5 years ago

@samulic, on a side note, please review your system settings with regards to decimal point. In your results table, it appears that the decimal character has been lost (happens when importing a csv file an system decimal character = ,, causing the true . decimal character to be ignored).

PyRadiomics results files ALWAYS use a . as the decimal character, regardless of system settings.

JoostJM commented 5 years ago

In your notebook, you state;

note that in PyRadiomics it is not possible to set alpha neighbouring threshold (which defaults to zero), it is possible only for GLDM

What exactly do you mean here? IBSI NGLDM is implemented in PyRadiomics as GLDM, and the alpha parameter can be set using gldm_a. Both IBSI and PyRadiomics NGTDM do not define an alpha parameter.

TimZaragori commented 4 years ago

Hi everyone, I face the same problem. I wanted to use pyradiomics as it is often used in radiomics publications but wanted to have the same values as IBSI benchmark. I try to work around resampling to make it IBSI compliant and got results which are closer to benchmark eventhough there is still some features that are not in tolerance range (Benchmark_IBSI.xlsx).

Here is the piece of code I am using :

def IBSI_resampling(image, aimed_voxel_sizes, interpolator):
    np.set_printoptions(floatmode='unique')
    aimed_voxel_sizes = np.array(aimed_voxel_sizes)
    current_voxel_size = np.array(image.GetSpacing())
    current_grid_size = np.array(image.GetSize())

    # Calculate grid size with ceiling operation
    aimed_grid_size = np.ceil(current_grid_size * current_voxel_size / aimed_voxel_sizes)

    # Calculate origin of resampled image for a grid center alignement
    current_origin = np.array(image.GetOrigin())
    aimed_origin = current_origin + (current_voxel_size * (current_grid_size - 1) - aimed_voxel_sizes * (aimed_grid_size - 1)) / 2

    aimed_grid_size = [int(_) for _ in aimed_grid_size]

    resampled_image = sitk.Resample(image, aimed_grid_size, sitk.Transform(), interpolator, aimed_origin,
                                    aimed_voxel_sizes, image.GetDirection(), 0., sitk.sitkFloat64)

    np.set_printoptions(floatmode='maxprec_equal')

    return resampled_image

Nonetheless, when you look at IBSI benchmark results online (https://docs.google.com/spreadsheets/d/15Wzj8j6eFxXBahMmfy0QcoMourrT84_K6SNL_lp5l8c/edit#gid=1447488242) there is a test with pyradiomics that match way better than that we got using pyradiomics alone. How was it done ?

Edit 20/12/2019 : Achieve better results by correcting something on Grey value rouding problem (Benchmark_IBSI.xlsx). I had small differences on mean of interpolated intensity mask. For example for config B : Before : phantom_resampled = sitk.Cast(IBSI_resampling(phantom, [2., 2., 2.], sitk.sitkLinear), sitk.sitkInt32) After : phantom_resampled = sitk.Round(IBSI_resampling(phantom, [2., 2., 2.], sitk.sitkLinear)) I understand your point of view on this rounding but I think it's important in this case since we are dealing with CT images and there is no physical meaning to have UH floating value. In case of PET images, I would let float numbers.

JoostJM commented 4 years ago

@TimZaragori,

That is correct. As I mentioned above, all technical differences between current IBSI definitions and PyRadiomics have been investigated. To deal with the differences and see if PyRadiomics could match IBSI, I created 2 functions to drop in place of resampling and discretization, matching IBSI definitions:

import logging

import numpy as np
import SimpleITK as sitk
import six

import radiomics
import radiomics.commandline

def IBSI_binning(parameterValues, **kwargs):
  ibsiLogger = logging.getLogger('radiomics.ibsi')

  binWidth = kwargs.get('binWidth', 25)
  binCount = kwargs.get('binCount')
  resegmentRange = kwargs.get('resegmentRange')
  resegmentMode = kwargs.get('resegmentMode', 'absolute')

  if binCount is not None:
    binEdges = np.histogram(parameterValues, binCount)[1]
    binEdges[-1] += 1  # Ensures that the maximum value is included in the topmost bin when using numpy.digitize
  else:
    minimum = min(parameterValues)
    maximum = max(parameterValues)

    # Start binning form the first value lesser than or equal to the minimum value and evenly dividable by binwidth
    lowBound = minimum - (minimum % binWidth)
    # Add + 2* binwidth to ensure the maximum value is included in the range generated by numpy.arange, and that values
    # equal to highbound are binned into a separate bin by numpy.histogram (This ensures ALL bins are half open, as
    # numpy.histogram treats the last bin as a closed interval. Moreover, this ensures consistency with numpy.digitize,
    # which will assign len(bins) + 1 to values equal to rightmost bin edge, treating all bins as half-open)
    highBound = maximum + 2 * binWidth

    # #####################################
    # IBSI difference
    # #####################################
    if resegmentRange is not None:
      if resegmentMode == 'absolute':
        lowBound = min(resegmentRange)
      elif resegmentMode == 'sigma':
        lowBound = minimum
    # #####################################

    binEdges = np.arange(lowBound, highBound, binWidth)

    # if min(parameterValues) % binWidth = 0 and min(parameterValues) = max(parameterValues), binEdges will only contain
    # 1 value. If this is the case (flat region) ensure that numpy.histogram creates 1 bin (requires 2 edges). For
    # numpy.histogram, a binCount (1) would also suffice, however, this is not accepted by numpy.digitize, which also uses
    # binEdges calculated by this function.
    if len(binEdges) == 1:  # Flat region, ensure that there is 1 bin
      binEdges = [binEdges[0] - .5, binEdges[0] + .5]  # Simulates binEdges returned by numpy.histogram if bins = 1

    ibsiLogger.debug('Calculated %d bins for bin width %g with edges: %s)', len(binEdges) - 1, binWidth, binEdges)

  return binEdges  # numpy.histogram(parameterValues, bins=binedges)

def IBSI_resampling(image, mask, **kwargs):
  ibsiLogger = logging.getLogger('radiomics.ibsi')
  # resample image to new spacing, align centers of both resampling grids.
  spacing = kwargs.get('resampledPixelSpacing')
  grayValuePrecision = kwargs.get('grayValuePrecision')
  interpolator = kwargs.get('interpolator', sitk.sitkLinear)

  try:
    if isinstance(interpolator, six.string_types):
      interpolator = getattr(sitk, interpolator)
  except Exception:
    ibsiLogger.warning('interpolator "%s" not recognized, using sitkLinear', interpolator)
    interpolator = sitk.sitkLinear

  im_spacing = np.array(image.GetSpacing(), dtype='float')
  im_size = np.array(image.GetSize(), dtype='float')

  spacing = np.where(np.array(spacing) == 0, im_spacing, spacing)

  spacingRatio = im_spacing / spacing
  newSize = np.ceil(im_size * spacingRatio)

  # Calculate center in real-world coordinates
  im_center = image.TransformContinuousIndexToPhysicalPoint((im_size - 1) / 2)

  new_origin = tuple(np.array(image.GetOrigin()) + 0.5 * ((im_size - 1) * im_spacing - (newSize - 1) * spacing))

  ibsiLogger.info('Resampling from %s to %s (size %s to %s), aligning Centers', im_spacing, spacing, im_size, newSize)

  rif = sitk.ResampleImageFilter()
  rif.SetOutputOrigin(new_origin)
  rif.SetSize(np.array(newSize, dtype='int').tolist())
  rif.SetOutputDirection(image.GetDirection())
  rif.SetOutputSpacing(spacing)

  rif.SetOutputPixelType(sitk.sitkFloat32)
  rif.SetInterpolator(interpolator)
  res_im = rif.Execute(sitk.Cast(image, sitk.sitkFloat32))

  # Round to n decimals (0 = to nearest integer)
  if grayValuePrecision is not None:
    ibsiLogger.debug('Rounding Image Gray values to %d decimals', grayValuePrecision)
    im_arr = sitk.GetArrayFromImage(res_im)
    im_arr = np.round(im_arr, grayValuePrecision)
    round_im = sitk.GetImageFromArray(im_arr)
    round_im.CopyInformation(res_im)
    res_im = round_im

  # Sanity check: Compare Centers!
  new_center = res_im.TransformContinuousIndexToPhysicalPoint((newSize - 1) / 2)
  ibsiLogger.debug("diff centers: %s" % np.abs(np.array(im_center) - np.array(new_center)))

  rif.SetOutputPixelType(sitk.sitkFloat32)
  rif.SetInterpolator(sitk.sitkLinear)
  res_ma = rif.Execute(sitk.Cast(mask, sitk.sitkFloat32))
  res_ma = sitk.BinaryThreshold(res_ma, lowerThreshold=0.5)

  return res_im, res_ma

if __name__ == '__main__':
  radiomics.imageoperations.getBinEdges = IBSI_binning
  radiomics.imageoperations.resampleImage = IBSI_resampling

  from radiomics.scripts import parse_args
  parse_args()

This code can be run, and will function as the PyRadiomics command line interface (function parse_args). The only difference here is that the code will now use the customized functions, following IBSI definitions.

TimZaragori commented 4 years ago

@JoostJM Are you interested in implementing other features from IBSI such as Intensity Volume Histogram features and local intensity features? I worked on implementing them in python with SimpleITK/numpy

hanxiaozhen2017 commented 3 years ago

parse_args

Hi JoostJM,

I am very new to pyradiomics and could you provide an example code to use your function? Many thanks!

JoostJM commented 3 years ago

@hanxiaozhen2017 see the online documentation. The parse_args function processes the functionality when using PyRadiomics from the commandline.

JoostJM commented 3 years ago

@TimZaragori, If you're still interested, yes please. Do you have your code available somewhere? Are you interested in creating a PR for PyRadiomics implementing your code? Even if it's just a start, it would be great.

TimZaragori commented 3 years ago

@JoostJM I would be glad to share my code with PyRadiomics community. I have it in a private repository on GitHub but it's quite messy. I can create a PR for PyRadiomics but where should I put the code ?

JoostJM commented 3 years ago

@TimZaragori, please do! This also allows us to review your code and suggest improvements if needed, and start a thread for the community and future reference.

JoostJM commented 3 years ago

Best would be to copy the firstorder feature class and rename it to, say, intensityhistogram or something similar (preference for a clear, but concise name).

JoostJM commented 3 years ago

See the developer section in the online documentation for the signatures we use.

ivanzhovannik commented 3 years ago

Hello @JoostJM ! I have a similar problem. I have run

  1. plastimatch convert on DICOM files (stored in a SQLite DB),
  2. pyradiomics
    • the original version
    • the version with getBinEdges and resampleImage replaced with your functions, proposed above.

The results for IBSI-based functions give less compliant results. Could you please propose any hints, where I can look? See Parameter files for configurations AB, the results, and the imageoperations.py attached pyradiomics_IBSI_benchmark_test.zip

ivanzhovannik commented 3 years ago

UPD: I thought that problem might lie in DICOM-RT->NRRD being different from IBSI nii files. However, it does not seem to be the case. I did run niftis through raw pyradiomics (original and corrected for IBSI in imageoperations.py) on my computer. I attach all the A-E configuration parameter files, and my pyradiomics output matched with IBSI for every configuration before and after correction. I still see quite some textural values being out of IBSI range even after the correction. Benchmark_CT_phantom_nifti_IBSI-pyradiomics-with-correction.zip

Could you please help? I am probably wrong with the parameter definition

TimZaragori commented 3 years ago

@JoostJM Being at the end of my thesis, I realize that I will be hard to code this myself with respect to the pyradiomics guidelines. However, I am still interested in participating to this implementation and answering the question on what I've done. Please find attached the folder with the code I used to validate my features against IBSI benchmark (and those of pyRadiomics) IBSI_data_test.zip

TimZaragori commented 2 years ago

Hi @JoostJM, I don't know if you are still working on the implementation of local intensity features by I modified a bit the code I sent you which was not very clean. Please find below a new version that gives the same results. I also found that IBSI was a bit too restrictive on these features, there are some cases where the aggressiveness of the tumor is represented by low values and so we would be interested in extracting the minimum and not the maximum values. What do you think about this and how can we handle this ?


def local_intensity_features2(image, mask):
    label_stats = sitk.LabelStatisticsImageFilter()

    # Convolution kernel creation
    dist = (3/(4*np.pi))**(1/3)*10
    radius_voxel = max(np.floor(dist / np.array(image.GetSpacing())))
    kernel = sitk.Image([int(radius_voxel) * 2 + 1] * 3, sitk.sitkUInt32)
    kernel.SetSpacing(image.GetSpacing())
    kernel_center = kernel.TransformPhysicalPointToIndex(np.array(kernel.GetSpacing()) * (np.array(kernel.GetSize()) - 1) / 2)
    kernel[kernel_center] = 1
    kernel = sitk.SignedMaurerDistanceMap(kernel, squaredDistance=False, useImageSpacing=True) <= dist

    # Convolution with image
    convolved_array = convolve(sitk.GetArrayFromImage(image).astype(np.float64), weights=sitk.GetArrayFromImage(kernel),
                               mode='constant', cval=0)
    normalization_array = convolve(np.ones_like(sitk.GetArrayFromImage(image), dtype=float),
                                   weights=sitk.GetArrayFromImage(kernel), mode='constant', cval=0)
    convolved_array /= normalization_array
    convolved_image = sitk.GetImageFromArray(convolved_array)
    convolved_image.CopyInformation(image)

    # === Local Intensity Peak ===
    # Select voxel with maximum intenisty in image
    label_stats.Execute(image, mask)
    maximum_image = (image == label_stats.GetMaximum(1)) * mask
    # Among these voxels select the maximum value in convolved_image
    label_stats.Execute(convolved_image, maximum_image)
    local_intensity_peak = label_stats.GetMaximum(1)

    # === Global Intensity Peak ===
    label_stats.Execute(convolved_image, mask)
    global_intensity_peak = label_stats.GetMaximum(1)

    return {'Local Intensity Peak': local_intensity_peak, 'Global Intensity Peak': global_intensity_peak}