AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.16k stars 499 forks source link

[FEAT EXTRACTION] features extracted from arrays don't match results calculated with numpy #582

Closed mluerig closed 4 years ago

mluerig commented 4 years ago

Describe the bug I am using pyradiomics in interactive mode on a lot of small 2D grayscale numpy arrays that are converted as suggested in https://github.com/Radiomics/pyradiomics/issues/546. However, the collected first order moments differ from what numpy calculates.

PyRadiomics configuration

# these were my guesses on how to solve it 
setting:
    binWidth: 1 
    padDistance: 0
    force2D: True

To Reproduce Minimum example:

from radiomics import featureextractor
import SimpleITK as sitk
import six

data=np.array([[ 0,  2,  1,  4,  4,  5,  6,  8,  8,  9, 10,  8,  8,  7,  5,  5,  3,  1,  0,  1],
       [ 1,  2,  1,  4,  5,  8, 10, 14, 14, 15, 15, 15, 12, 11,  9,  8,  2,  2,  1,  2],
       [ 0,  2,  4,  8, 11, 15, 16, 21, 24, 29, 30, 27, 25, 18, 15, 11,  6,  6,  3,  1],
       [ 1,  4,  7, 11, 16, 24, 33, 43, 46, 50, 50, 46, 42, 34, 24, 17, 11,  8,  3,  1],
       [ 4,  7, 13, 20, 29, 41, 53, 61, 65, 67, 68, 64, 55, 48, 34, 24, 17, 10,  5,  3],
       [ 4, 10, 18, 30, 42, 57, 70, 72, 75, 75, 78, 72, 66, 55, 48, 33, 21, 14,  7,  3],
       [ 7, 16, 26, 43, 55, 62, 74, 76, 79, 84, 78, 76, 70, 66, 55, 40, 28, 16, 10,  5],
       [ 7, 19, 34, 45, 60, 70, 77, 75, 85, 81, 83, 81, 77, 68, 58, 46, 28, 18, 10,  5],
       [12, 20, 35, 51, 65, 69, 74, 78, 77, 85, 86, 77, 77, 72, 59, 48, 34, 21, 11,  6],
       [11, 22, 33, 46, 59, 71, 76, 80, 85, 87, 86, 84, 76, 76, 68, 54, 36, 23, 11,  6],
       [12, 21, 31, 47, 52, 64, 79, 77, 86, 88, 89, 83, 78, 72, 68, 50, 37, 19, 10,  7],
       [10, 18, 24, 39, 49, 59, 71, 81, 86, 89, 87, 86, 76, 67, 56, 46, 30, 15,  9,  4],
       [ 6, 11, 20, 30, 45, 52, 65, 72, 78, 84, 82, 80, 69, 57, 46, 35, 23, 13,  6,  4],
       [ 5,  8, 12, 18, 29, 45, 53, 60, 65, 71, 72, 67, 56, 48, 35, 25, 15,  9,  6,  3],
       [ 5,  7,  9, 13, 19, 28, 37, 44, 52, 52, 56, 50, 43, 31, 23, 15, 10,  6,  4,  2],
       [ 1,  3,  6,  8, 13, 14, 22, 27, 31, 33, 34, 30, 23, 18, 13,  8,  5,  4,  1,  2],
       [ 2,  1,  4,  5,  7,  9, 10, 15, 16, 17, 16, 15, 12, 10,  6,  6,  3,  3,  2,  0],
       [ 0,  2,  1,  3,  4,  4,  7,  6,  8,  7,  7,  7,  5,  5,  3,  4,  1,  1,  0,  1]], dtype=np.uint8)
data_mask=np.array([[255, 255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255],
       [255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255],
       [255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255],
       [255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255],
       [255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255],
       [255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255],
       [255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255],
       [255, 255, 255, 255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255],
       [255, 255, 255, 255, 255, 255, 255, 255,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255]], dtype=np.uint8)

img = sitk.GetImageFromArray(data)
mask = sitk.GetImageFromArray(data_mask)

params = "params.yml"
# setting:
#     binWidth: 1
#     padDistance: 0
#     force2D: True

extractor = featureextractor.RadiomicsFeatureExtractor(params)
features = extractor.execute(img, mask, label=255)
for key, val in six.iteritems(features):
  print("\t%s: %s" %(key, val))

## original_firstorder_Mean: 3.367816091954023

masked = ma.array(data=data, mask=data_mask)
np.ma.mean(masked)

## 40.523809523809526

Version (please complete the following information):

pieper commented 4 years ago

Both answers looks correct - when you pick the label 255 for pyradiomics it is selecting all the outer elements, which are the small numbers and a mean around 3 is reasonable.

But masked arrays have a different meaning - the non-zero values are the "tainted" or "invalid" values, so you are selecting the inside values which are bigger and a mean of 40 looks good.

https://numpy.org/doc/stable/reference/maskedarray.generic.html

mluerig commented 4 years ago

oh boy, seems so obvious now. thanks!