AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.14k stars 493 forks source link

[FEAT EXTRACTION] Cannot reproduce some feature extraction metrics #692

Closed mattwarkentin closed 3 years ago

mattwarkentin commented 3 years ago

Describe the bug

Hi @JoostJM, I will try to keep this as simple as possible. I performed a feature extraction using a CT image and a mask/label map with only a single label (one tumour). I was looking over some of the reported diagnostic metrics and they looked odd to me, so I tried to reproduce them by using some of the lower-level functions that are used in the feature extracting process. In particular, the bounding boxes values confused me.

The numbers returned via RadiomicsFeatureExtractor() do not agree with either those returned by directly using sitk.LabelStatisticsImageFilter() or CheckMask(). Please see below.

PyRadiomics configuration

# config.yaml
setting:
  interpolator: 'sitkLinear'
  resampledPixelSpacing: [1, 1, 1]
  binWidth: 25
  voxelArrayShift: 1000
  label: 1

To Reproduce I can't share the specific image and mask, but I think this should be reproducible with any image-mask pair. Here is some code and results that present the issue:

import SimpleITK as sitk
import radiomics.featureextractor as rfe
import radiomics.imageoperations as rio

# Path to CT & Mask
ct = "ct.nrrd"
mask = "mask.nrrd"
config = "config.yaml"

# Read in using SimpleITK
img = sitk.ReadImage(ct, sitk.sitkInt32, "NrrdImageIO")
lmap = sitk.ReadImage(mask, sitk.sitkUInt32, "NrrdImageIO")

# Feature Extraction
fe = rfe.RadiomicsFeatureExtractor(config)
features = fe.execute(img, lmap, 1)

features['diagnostics_Mask-original_BoundingBox']
#> (170, 350, 91, 8, 5, 3)

features['diagnostics_Mask-interpolated_BoundingBox']
#> (6, 6, 6, 7, 4, 5)

# Manual bounding box
lsif = sitk.LabelStatisticsImageFilter()
lsif.Execute(img, lmap)
lsif.GetBoundingBox(1)
#> (170, 177, 350, 354, 91, 93)

chk = rio.checkMask(img, lmap)
chk[0]
#> array([170, 177, 350, 354,  91,  93])

img_crop, mask_crop = rio.cropToTumorMask(img, lmap, chk[0])

lsif2 = sitk.LabelStatisticsImageFilter()
lsif2.Execute(img_crop, mask_crop)
lsif2.GetBoundingBox(1)
#> (0, 7, 0, 4, 0, 2)

Version:

mattwarkentin commented 3 years ago

As a follow-up, even using the resampled image and mask does not match the bounding boxes:

# Resampling
img_rs, mask_rs = rio.resampleImage(img, lmap, resampledPixelSpacing = [1, 1, 1], interpolator = "sitkLinear")

lsif3 = sitk.LabelStatisticsImageFilter()
lsif3.Execute(img_rs, mask_rs)
lsif3.GetBoundingBox(1)
#> (6, 12, 6, 9, 6, 10)

chk = rio.checkMask(img_rs, mask_rs)
chk[0]
#> array([ 6, 12,  6,  9,  6, 10])
JoostJM commented 3 years ago

HI @mattwarkentin, sorry for the late reply.

The differences in the BoundingBoxes are mainly due to the use of 2 separate SimpleITK functions, which return a slightly different notation of the bounding box. imageopration.CheckMask uses SimpleITK.LabelStatisticsImageFilter, returning the upper and lower bound indices (see docs) generalInfo (diagnostic features) use SimpleITK.LableShapeStatiscticsImageFilter, returning the lower bound and size of the bounding box (see docs)

Taking this into account, your reported bounding boxes seem fine to me.