AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.14k stars 493 forks source link

voxel-based extraction #715

Closed smcch closed 2 years ago

smcch commented 3 years ago

After voxel-based feature extraction, I only get .nrrd files (feature maps), but, is it possible to get the value of each feature in every single voxel ???

JoostJM commented 3 years ago

The reason you get .nrrd files for each feature is because you extracted voxel-based, and therefore have a value for each voxel per feature. The individual voxel values in the nrrd files are the feature values you're looking for.

smcch commented 3 years ago

Thanks Joost,

I was having trouble exporting the array to csv format, the only way it works is as follows:

import SimpleITK as sitk import numpy as np data= sitk.ReadImage('original_glcm_Contrast.nrrd', imageIO="NrrdImageIO") np.savetxt("glcm_contrast.csv", data,delimiter = ",")

I got a file like this: glcm_contrast.csv

I have a dataset of about 50 subjects, with 3 MR sequences each. Each subject has 2 ROI's. And from each ROI around 200 feature maps each. My question is: is there any way to perform a voxel based extraction in batch mode and with output directly in csv format?. Otherwise converting each map of the entire sample would be unfeasible.

JoostJM commented 3 years ago

You can run PyRadiomics in batch mode, even when extracting voxel-based. This is detailed in the documentation. This will not include your conversion to csv files, you'd have to do that in a separate loop.

I am wondering though, why is it necessary to convert to csv? This is a less precise and more memory intensive format compared to NRRD, which also removes all geometric information that would be required if an overlay must be created. From an analysis point of view, NRRD files can easily be converted to numpy arrays using SimpleITK (sitk.GetArrayFromImage).

smcch commented 3 years ago

Let me explain better what I mean. My objective is to predict areas of tumor infiltration using the radiomic features of MRI in gliomas. I have segmented two regions. One of them is the infiltration zone, and the other is the region without tumor infiltration. I want to perform an analysis based on voxels to obtain the differences between the two areas. First, choose the radiomic features with the most significant discrimination capacity. Then build a prediction model and generate a probability map for each voxel in the validation data set (developing a new segmentation based on the model). So I thought that to manipulate the data, I need to extract the values of each voxel to CSV format, probably I am wrong, and there is a more efficient solution.

pieper commented 3 years ago

For what you describe it would be more efficient to access the voxels as an array, which you can easily do in Slicer / sitk, or in native python with pynrrd.

smcch commented 3 years ago

Thanks for your help. Certainly exporting the values to csv format is not the most suitable. Regarding the batch mode, I have been able to obtain the results properly, however, when I also run the voxel-based mode I only obtain the feature maps of a single case. Maybe the nrrd files are being overwritten???. This is the script I use, how should I correct it?

from future import print_function

import collections import csv import logging import os import six

import SimpleITK as sitk

import radiomics from radiomics import featureextractor

def main(): outPath = r''

inputCSV = os.path.join(outPath, 'input2.csv') outputFilepath = os.path.join(outPath, 'radiomics_features.csv') progress_filename = os.path.join(outPath, 'pyrad_log.txt') params = os.path.join(outPath,'Params.yaml')

Configure logging

rLogger = logging.getLogger('radiomics')

Set logging level

rLogger.setLevel(logging.INFO) # Not needed, default log level of logger is INFO

Create handler for writing to log file

handler = logging.FileHandler(filename=progress_filename, mode='w') handler.setFormatter(logging.Formatter('%(levelname)s:%(name)s: %(message)s')) rLogger.addHandler(handler)

Initialize logging for batch log messages

logger = rLogger.getChild('batch')

Set verbosity level for output to stderr (default level = WARNING)

radiomics.setVerbosity(logging.INFO)

logger.info('pyradiomics version: %s', radiomics.version) logger.info('Loading CSV')

flists = [] try: with open(inputCSV, 'r') as inFile: cr = csv.DictReader(inFile, lineterminator='\n') flists = [row for row in cr] except Exception: logger.error('CSV READ FAILED', exc_info=True)

logger.info('Loading Done') logger.info('Patients: %d', len(flists))

if os.path.isfile(params): extractor = featureextractor.RadiomicsFeatureExtractor(params) else: # Parameter file not found, use hardcoded settings instead settings = {} settings['binWidth'] = 25 settings['resampledPixelSpacing'] = None # [3,3,3] settings['interpolator'] = sitk.sitkBSpline settings['enableCExtensions'] = True

extractor = featureextractor.RadiomicsFeatureExtractor(**settings)
# extractor.enableInputImages(wavelet= {'level': 2})

logger.info('Enabled input images types: %s', extractor.enabledImagetypes) logger.info('Enabled features: %s', extractor.enabledFeatures) logger.info('Current settings: %s', extractor.settings)

headers = None

for idx, entry in enumerate(flists, start=1):

logger.info("(%d/%d) Processing Patient (Image: %s, Mask: %s)", idx, len(flists), entry['Image'], entry['Mask'])

imageFilepath = entry['Image']
maskFilepath = entry['Mask']
label = entry.get('Label', None)

if str(label).isdigit():
  label = int(label)
else:
  label = None

if (imageFilepath is not None) and (maskFilepath is not None):
  featureVector = collections.OrderedDict(entry)
  featureVector['Image'] = os.path.basename(imageFilepath)
  featureVector['Mask'] = os.path.basename(maskFilepath)

  try:
    featureVector.update(extractor.execute(imageFilepath, maskFilepath, label))

    with open(outputFilepath, 'a') as outputFile:
      writer = csv.writer(outputFile, lineterminator='\n')
      if headers is None:
        headers = list(featureVector.keys())
        writer.writerow(headers)

      row = []
      for h in headers:
        row.append(featureVector.get(h, "N/A"))
      writer.writerow(row)
  except Exception:
    logger.error('FEATURE EXTRACTION FAILED', exc_info=True)

# Voxel-based extraction
result = extractor.execute((entry['Image']), (entry['Mask']), voxelBased=True)
for key, val in six.iteritems(result):
  if isinstance(val, sitk.Image):  # Feature map
    sitk.WriteImage(val, key + '.nrrd', True)
    print("Stored feature %s in %s" % (key, key + ".nrrd"))
  else:  # Diagnostic information
    print("\t%s: %s" % (key, val))

if name == 'main': main()

By the way, from the terminal I get all the maps of each case but with the default settings (93 feature maps per case): pyradiomics <path/to/input> --mode voxel

JoostJM commented 3 years ago

@smcch, If I'm correct, you would be most interested in one value for each feature for each of the two areas you want to compare? If that is the case, my advise would be to not use voxel-based radiomics, but rather segment based radiomics using the two areas as input. If they are segmented as 2 distinct areas (either as 2 different segments or using 2 different label values), PyRadiomics can extract features. To do so, include a column in the csv file named "Label", with the value of the label used for each area as a value on the row. This means that for each image and mask combination, you make multiple rows, 1 for each distinct label value. Then, use this csv to compute features via the PyRadiomics command line. This removes the need to write your own script, and includes features such as parallel extraction, speeding up the process.

JoostJM commented 3 years ago

By the way, from the terminal I get all the maps of each case but with the default settings (93 feature maps per case): pyradiomics <path/to/input> --mode voxel

This is due to the fact that you need to include the parameter file to enable any kind of customization. This is easiest done using the -p argument, where you can pass a configuration file (see the online documentation for the format and repository for examples). In the -s argument you can pass overrides to specific settings, this is also explained in the documentation.

smcch commented 3 years ago

Thank you, Joost. I'm going to try the region-based extraction.

smcch commented 3 years ago

One more question. When I run batch + voxel mode in console mode, I get that each feature map is named sequentially. (case-1_texture_feature). How can I assign an ID to them? Each of my cases has 3 MRI sequences and in each sequence has two masks. What I would like is that the feature map to be named something like this: id_case_MR_seq_ROI_num_texture_feature.nrrd

JoostJM commented 2 years ago

Input columns are preserved in the output. Best approach would be to compute all your data, then use pandas to correctly name all relevant files. If you want to change it in the batch processing, you'd have to make some changes here, and make sure it's handled correctly later on (susch as here)