Comparison with IBSI features on phantoms

samulic commented 5 years ago

Hello, I'm doing a project on radiomics and I've been trying to compare the results of the features calculated with pyradiomics with the ones reported in the IBSI guidelines on their CT phantom data so that I can be sure I am using this great tool in the proper way. However after a few weeks reading the documentation and various questions already asked here, I'm still having troubles on understanding why my settings cannot reproduce the same result as on the IBSI guidelines.

Here is the code and the configurations I am using for settings C, D and E only. Some settings are unnecessary since they take the same default value, but are there for better clarification. NotebookMarkdown NotebookPDF

The results are slightly different from the values reported by the IBSI, which can be found here, starting from the shape features (!?) all the way to the ngtdm features of course. What is driving me crazy is that with the digital phantoms that don't require preprocessing I can get almost every feature matching with the IBSI. Is there anyone who could kindly help me clarifying why these preprocessing configurations are not matching with the IBSI? Here is the spreadsheet with the results from the code above.

Thank you! Gabriele

JoostJM commented 5 years ago

Hello Gabrielle,

We've investigated this previously (PyRadiomics is also part of IBSI). There are differences between the IBSI benchmarks and PyRadiomics results, but these differences have been tracked down. Most notably are the differences in gray value discretization (just for the fixed bin size type) and resampling. These differences cannot be corrected by customization settings alone and require replacement by custom functions:

Binning: When performing gray value discretization using a fixed bin width (AKA IBSI: FBS, Fixed Bins Size), and with Resegmentation set (most notable Case A and C), IBSI computes the binedges as equally spaced from the minimum of the resegmentation range if absolute-resegmentation, and min(intensities) when sigma-resegmentation. In PyRadiomics, gray value discretization using a fixed bin width always utilizes binedges which are equally spaced from 0 and where the lowest gray level is ensured to be discretized to the first bin. Regardless of any resegmentation, etc.
Resampling:
1. Alignment of the grid: In IBSI, resampling grid is aligned by aligning the center of the image, whereas in Pyradiomics, we align the corner of the origin voxel. This can result in slightly different interpolated results, and even slightly different sizes of the resampled image and ROI, which in turn causes differences in extracted feature values.
2. Gray value rounding: In IBSI, they reason that if the original intensity values are from some lower precision datatype, resampled values (which are floating point numbers, usually 64-bit) should be resampled to a similar resolution. In the case of the IBSI phantoms, resampling to the nearest integer. PyRadiomics does not implement this, as differences are likely to be small and therefore serve more to add complexity than increase the meaning of the extracted values. Especially considering gray values are discretized anyway prior to calculation of most (all except firstorder) features. If some kind of normalization is performed, then meaning of the gray values changes as well. Differences arise here, because small rounding differences can cause a voxel to be assigned to a different bin, which can be a marked change in feature value results, especially in small ROIs.
3. Mask resampling: In IBSI, different interpolators can also be selected for resampling of the mask, with an additional thresholding the retrieve the binary mask. This only works if the mask is limited to zero and non-zero (i.e. 1) values. PyRadiomics also supports masks with different value labels, allowing extraction from different ROIs in the same mask file by indicating different label values. To prevent any incorrect re-assignment, PyRadiomics forces the mask resampling to be nearest neighbor.

Next, there are also some differences which can be solved by custom settings, in this case as only applies to case E, where both absolute AND sigma resegmentation are performed. In PyRadiomics, both types are implemented, but only 1 can be selected at a time. To simulate applying both types, I calculated the absolute range after resegmentation and used that as the absolute resegment range: [ -718, 400 ]

Finally, there is a difference between PyRadiomics and IBSI in the calculation of firstorder: Kurtosis. IBSI calculates Excess Kurtosis, which is equal to Kurtosis - 3. PyRadiomics calculates Kurtosis, which is always +3 compared to IBSI. The difference of 3 stems from the fact that a gaussian distribution has a kurtosis of 3.

So finalizing, the cause of the difference between PyRadiomics results and IBSI benchmark, per case:

Case C: Due to differences in gray value discretization and resampling
Case D: Due to differences in resampling
Case E: Due to differences in resampling and resegmentation

JoostJM commented 5 years ago

@samulic, on a side note, please review your system settings with regards to decimal point. In your results table, it appears that the decimal character has been lost (happens when importing a csv file an system decimal character = ,, causing the true . decimal character to be ignored).

PyRadiomics results files ALWAYS use a . as the decimal character, regardless of system settings.

JoostJM commented 5 years ago

In your notebook, you state;

note that in PyRadiomics it is not possible to set alpha neighbouring threshold (which defaults to zero), it is possible only for GLDM

What exactly do you mean here? IBSI NGLDM is implemented in PyRadiomics as GLDM, and the alpha parameter can be set using gldm_a. Both IBSI and PyRadiomics NGTDM do not define an alpha parameter.

TimZaragori commented 4 years ago

Hi everyone, I face the same problem. I wanted to use pyradiomics as it is often used in radiomics publications but wanted to have the same values as IBSI benchmark. I try to work around resampling to make it IBSI compliant and got results which are closer to benchmark eventhough there is still some features that are not in tolerance range (Benchmark_IBSI.xlsx).

Here is the piece of code I am using :

def IBSI_resampling(image, aimed_voxel_sizes, interpolator):
    np.set_printoptions(floatmode='unique')
    aimed_voxel_sizes = np.array(aimed_voxel_sizes)
    current_voxel_size = np.array(image.GetSpacing())
    current_grid_size = np.array(image.GetSize())

    # Calculate grid size with ceiling operation
    aimed_grid_size = np.ceil(current_grid_size * current_voxel_size / aimed_voxel_sizes)

    # Calculate origin of resampled image for a grid center alignement
    current_origin = np.array(image.GetOrigin())
    aimed_origin = current_origin + (current_voxel_size * (current_grid_size - 1) - aimed_voxel_sizes * (aimed_grid_size - 1)) / 2

    aimed_grid_size = [int(_) for _ in aimed_grid_size]

    resampled_image = sitk.Resample(image, aimed_grid_size, sitk.Transform(), interpolator, aimed_origin,
                                    aimed_voxel_sizes, image.GetDirection(), 0., sitk.sitkFloat64)

    np.set_printoptions(floatmode='maxprec_equal')

    return resampled_image

Nonetheless, when you look at IBSI benchmark results online (https://docs.google.com/spreadsheets/d/15Wzj8j6eFxXBahMmfy0QcoMourrT84_K6SNL_lp5l8c/edit#gid=1447488242) there is a test with pyradiomics that match way better than that we got using pyradiomics alone. How was it done ?

Edit 20/12/2019 : Achieve better results by correcting something on Grey value rouding problem (Benchmark_IBSI.xlsx). I had small differences on mean of interpolated intensity mask. For example for config B : Before : phantom_resampled = sitk.Cast(IBSI_resampling(phantom, [2., 2., 2.], sitk.sitkLinear), sitk.sitkInt32) After : phantom_resampled = sitk.Round(IBSI_resampling(phantom, [2., 2., 2.], sitk.sitkLinear)) I understand your point of view on this rounding but I think it's important in this case since we are dealing with CT images and there is no physical meaning to have UH floating value. In case of PET images, I would let float numbers.

JoostJM commented 4 years ago

@TimZaragori,

That is correct. As I mentioned above, all technical differences between current IBSI definitions and PyRadiomics have been investigated. To deal with the differences and see if PyRadiomics could match IBSI, I created 2 functions to drop in place of resampling and discretization, matching IBSI definitions:

import logging

import numpy as np
import SimpleITK as sitk
import six

import radiomics
import radiomics.commandline

def IBSI_binning(parameterValues, **kwargs):
  ibsiLogger = logging.getLogger('radiomics.ibsi')

  binWidth = kwargs.get('binWidth', 25)
  binCount = kwargs.get('binCount')
  resegmentRange = kwargs.get('resegmentRange')
  resegmentMode = kwargs.get('resegmentMode', 'absolute')

  if binCount is not None:
    binEdges = np.histogram(parameterValues, binCount)[1]
    binEdges[-1] += 1  # Ensures that the maximum value is included in the topmost bin when using numpy.digitize
  else:
    minimum = min(parameterValues)
    maximum = max(parameterValues)

    # Start binning form the first value lesser than or equal to the minimum value and evenly dividable by binwidth
    lowBound = minimum - (minimum % binWidth)
    # Add + 2* binwidth to ensure the maximum value is included in the range generated by numpy.arange, and that values
    # equal to highbound are binned into a separate bin by numpy.histogram (This ensures ALL bins are half open, as
    # numpy.histogram treats the last bin as a closed interval. Moreover, this ensures consistency with numpy.digitize,
    # which will assign len(bins) + 1 to values equal to rightmost bin edge, treating all bins as half-open)
    highBound = maximum + 2 * binWidth

    # #####################################
    # IBSI difference
    # #####################################
    if resegmentRange is not None:
      if resegmentMode == 'absolute':
        lowBound = min(resegmentRange)
      elif resegmentMode == 'sigma':
        lowBound = minimum
    # #####################################

    binEdges = np.arange(lowBound, highBound, binWidth)

    # if min(parameterValues) % binWidth = 0 and min(parameterValues) = max(parameterValues), binEdges will only contain
    # 1 value. If this is the case (flat region) ensure that numpy.histogram creates 1 bin (requires 2 edges). For
    # numpy.histogram, a binCount (1) would also suffice, however, this is not accepted by numpy.digitize, which also uses
    # binEdges calculated by this function.
    if len(binEdges) == 1:  # Flat region, ensure that there is 1 bin
      binEdges = [binEdges[0] - .5, binEdges[0] + .5]  # Simulates binEdges returned by numpy.histogram if bins = 1

    ibsiLogger.debug('Calculated %d bins for bin width %g with edges: %s)', len(binEdges) - 1, binWidth, binEdges)

  return binEdges  # numpy.histogram(parameterValues, bins=binedges)

def IBSI_resampling(image, mask, **kwargs):
  ibsiLogger = logging.getLogger('radiomics.ibsi')
  # resample image to new spacing, align centers of both resampling grids.
  spacing = kwargs.get('resampledPixelSpacing')
  grayValuePrecision = kwargs.get('grayValuePrecision')
  interpolator = kwargs.get('interpolator', sitk.sitkLinear)

  try:
    if isinstance(interpolator, six.string_types):
      interpolator = getattr(sitk, interpolator)
  except Exception:
    ibsiLogger.warning('interpolator "%s" not recognized, using sitkLinear', interpolator)
    interpolator = sitk.sitkLinear

  im_spacing = np.array(image.GetSpacing(), dtype='float')
  im_size = np.array(image.GetSize(), dtype='float')

  spacing = np.where(np.array(spacing) == 0, im_spacing, spacing)

  spacingRatio = im_spacing / spacing
  newSize = np.ceil(im_size * spacingRatio)

  # Calculate center in real-world coordinates
  im_center = image.TransformContinuousIndexToPhysicalPoint((im_size - 1) / 2)

  new_origin = tuple(np.array(image.GetOrigin()) + 0.5 * ((im_size - 1) * im_spacing - (newSize - 1) * spacing))

  ibsiLogger.info('Resampling from %s to %s (size %s to %s), aligning Centers', im_spacing, spacing, im_size, newSize)

  rif = sitk.ResampleImageFilter()
  rif.SetOutputOrigin(new_origin)
  rif.SetSize(np.array(newSize, dtype='int').tolist())
  rif.SetOutputDirection(image.GetDirection())
  rif.SetOutputSpacing(spacing)

  rif.SetOutputPixelType(sitk.sitkFloat32)
  rif.SetInterpolator(interpolator)
  res_im = rif.Execute(sitk.Cast(image, sitk.sitkFloat32))

  # Round to n decimals (0 = to nearest integer)
  if grayValuePrecision is not None:
    ibsiLogger.debug('Rounding Image Gray values to %d decimals', grayValuePrecision)
    im_arr = sitk.GetArrayFromImage(res_im)
    im_arr = np.round(im_arr, grayValuePrecision)
    round_im = sitk.GetImageFromArray(im_arr)
    round_im.CopyInformation(res_im)
    res_im = round_im

  # Sanity check: Compare Centers!
  new_center = res_im.TransformContinuousIndexToPhysicalPoint((newSize - 1) / 2)
  ibsiLogger.debug("diff centers: %s" % np.abs(np.array(im_center) - np.array(new_center)))

  rif.SetOutputPixelType(sitk.sitkFloat32)
  rif.SetInterpolator(sitk.sitkLinear)
  res_ma = rif.Execute(sitk.Cast(mask, sitk.sitkFloat32))
  res_ma = sitk.BinaryThreshold(res_ma, lowerThreshold=0.5)

  return res_im, res_ma

if __name__ == '__main__':
  radiomics.imageoperations.getBinEdges = IBSI_binning
  radiomics.imageoperations.resampleImage = IBSI_resampling

  from radiomics.scripts import parse_args
  parse_args()

This code can be run, and will function as the PyRadiomics command line interface (function parse_args). The only difference here is that the code will now use the customized functions, following IBSI definitions.

TimZaragori commented 4 years ago

@JoostJM Are you interested in implementing other features from IBSI such as Intensity Volume Histogram features and local intensity features? I worked on implementing them in python with SimpleITK/numpy

hanxiaozhen2017 commented 3 years ago

parse_args

Hi JoostJM,

I am very new to pyradiomics and could you provide an example code to use your function? Many thanks!