AIM-Harvard / pyradiomics

Open-source python package for the extraction of Radiomics features from 2D and 3D images and binary masks. Support: https://discourse.slicer.org/c/community/radiomics
http://pyradiomics.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.11k stars 485 forks source link

Radiomics Feature based classifier #798

Open devo-id opened 1 year ago

devo-id commented 1 year ago

Hello everyone, I am a Computer science college student and new to radiomics and medical science. I want to build a radiomics feature-based classifier. I have to take CT scans Dicom images.

Firstly, I found a dataset [NSCLC-Radiomics - The Cancer Imaging Archive (TCIA) Public Access - Cancer Imaging Archive Wiki ](https://wiki.cancerimagingarchive.net/display/Public/NSCLC-Radiomics), where they labeled different regions of the body like left and right lung, esophagus, spine, and the abnormal tissue that is a tumor itself. From all these segments I choose the tumor segmentation and calculated the features. But these features are for Cancer patients only. For the non-cancer, I was unable to understand the ROI. Obviously, the non-cancer CT scan can’t have an ROI.

Is it possible to classify cancer and non-cancer patient with radiomics? If yes, then what about the ROI for the non-cancer patient? And the relevant dataset to do so.

After long research, I came across the dataset Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans (LIDC-IDRI) Data from The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans (LIDC-IDRI) - The Cancer Imaging Archive (TCIA) Public Access - Cancer Imaging Archive Wiki, where they mentioned that they have classified the nodules(abnormal tissue) based on their size. I read in their article https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041807/

The Database contains 7371 lesions marked “nodule” by at least one radiologist. 2669 of these lesions were marked “nodule≥3 mm” by at least one radiologist, of which 928 (34.7%) received such marks from all four radiologists. These 2669 lesions include nodule outlines and subjective nodule characteristic ratings.

and they scaled the nodule size from 1-5, (1,2 for small-size nodules, I will call them non-cancer, and 4,5 for large-size nodules, which I’ll treat as Cancer Data). I hoped that finally I found the right dataset but the dataset is so confusing.

I used their python package pylidc for preprocessing. After compiling it, I got multiple NumPy arrays which I don’t know how to use in pyradiomics.

They only accept that image with its mask. It is so confusing for me.

I don’t know whether to find a dataset to do so. I have spent almost a month reading about it. I have tried many datasets but found nothing relevant to my project. I really need help regarding this.

Ianyliu commented 1 year ago

Hi @devo-id

  1. Feature Extraction

Since you want to classify cancer and non-cancer patients, one idea is to extract the features of the entire region (ex. if you're doing brain then extract features of the entire brain as opposed to a specific region). In doing so, maybe the models you're training can detect a difference in overall radiomics features.

However, let's say that does not work. Then what you might consider is using an atlas or template. One example of a brain atlas is AAL. You can try extracting the features of every region and start from there. (Obviously doing that for hundreds of regions is impractical, so this is just an idea to get you started).

  1. Pyradiomics accepts image inputs such as DICOM or Nifti, so you might want to consider converting the Numpy arrays back into images.

Here's an example of converting a Nifti image into a Numpy array and back into a Nifti image:

img_nifti = nib.load('image.nii.gz')
img_affine = img_nifti.affine
img_arr = img_nifti.get_fdata()
nifti_file = nib.Nifti1Image(img_arr, img_affine)
nib.save(nifti_file, 'random_test.nii.gz')

The affine part I'm not too clear on how to explain or understand that yet, so maybe you can refer to online resources or something like Chat GPT/Bard AI/Bing AI