QIICR / lidc2dicom

Scripts for converting TCIA LIDC-IDRI collection derived data into standard DICOM representation from project-specific XML format.
24 stars 12 forks source link

What will be the python command to run lidc2dicom.py file? #4

Closed monjoybme closed 5 years ago

monjoybme commented 5 years ago

My LIDC-IDRI datasets (.dcm and XML files) are available in the below directory

/Users/me/Downloads/LIDC-IDRI/LIDC-IDRI-0997/01-01-2000-96481/1491-NLSTCVOLB30F-16103 I'm using below command to run the lidc2dicom.py code

python lidc2idri.py --images-dir /Users/me/Downloads/LIDC-IDRI/ --output-dir /Users/me/Downloads/ --log Users/me/Downloads/

The code is terminating without any output/error.

  1. Could you kindly tell me my command is correct or not?

  2. I also would like to know "seg_conversion_template.json" , "sr_conversion_template.json", and "GenericColors.txt". How these files have been generated? May I use same file for extracting annotated images from LIDC?

  3. The image directory basically the .dcm and xml file directory which I have downloaded from https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI not from https://drive.google.com/drive/folders/1S-Ctfko4v0ndpvNfLA0J7Q_F1Zpk4bq0 Please confirm.

fedorov commented 5 years ago

Could you kindly tell me my command is correct or not?

You need to specify which subjects should be processed, there are different options for doing that which are listed if you run converter with --help. For example, to do conversion for one single subject, you can do this:

python lidc2dicom.py --images-dir ~/Documents/TCIA/LIDC-IDRI --output-dir ~/Temp/LIDC_conversion5 --subjects 1

I also would like to know "seg_conversion_template.json" , "sr_conversion_template.json", and "GenericColors.txt". How these files have been generated?

The JSON files were written manually, following the definition of the parameter file for dcmqi. The GenericColors.txt is from 3D Slicer here: https://github.com/Slicer/Slicer/blob/master/Base/Logic/Resources/ColorFiles/GenericColors.txt.

May I use same file for extracting annotated images from LIDC?

I don't understand this question.

The image directory basically the .dcm and xml file directory which I have downloaded from https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI

The converter does not parse xml files. It is using pylidc to query previously parsed annotations, and converts them into DICOM segmentations.

Can you clarify what you are trying to do? The converter will generate output similar to what you already have in https://drive.google.com/drive/folders/1S-Ctfko4v0ndpvNfLA0J7Q_F1Zpk4bq0 (it has been updated since the initial release, so the output will not be identical, I am in the process of generating the updated dataset).

The purpose of this script is primarily to provide a record of how we did the conversion, to serve as an example for those who want to do similar tasks. I do not see a good reason to re-run it to do the conversion of the same dataset. What is your goal?

monjoybme commented 5 years ago

Thanks a lot. Now it has been cleared to me. But I have one more confusion. The Google Drive (URL: https://drive.google.com/drive/folders/1S-Ctfko4v0ndpvNfLA0J7Q_F1Zpk4bq0 ) folder, e.g., LIDC-IDRI-001, contains '.dcm' and '.json' Files. Their filename is different. It will be easy for the readers if you kindly explain:

  1. How the files are renamed (e.g., Nodule 1 -Annotation 0 measurements.dcm, Nodule 1 -Annotation 0.dcm, Nodule 1 -Annotation IL057_127364 measurements.dcm.
  2. Do you have any code to visualized annotated images? I tried but it showing black.
fedorov commented 5 years ago

The JSON files are the parameter configuration files used by dcmqi converters in the process of creating the DICOM files. .dcm files are the segmentations and structured reports with measurements and evaluations. Did you read this preprint:

Fedorov A, Hancock M, Clunie D, Brochhausen M, Bona J, Kirby J, Freymann J, Pieper S, Aerts H, Kikinis R, Prior F. 2018. Standardized representation of the LIDC annotations using DICOM. PeerJ Preprints 6:e27378v1 https://doi.org/10.7287/peerj.preprints.27378

It has the details. You can use 3D Slicer + QuantitativeReporting extension for visualization of the resulting DICOM data. More examples and details coming up in the next version of the preprint.

monjoybme commented 5 years ago

Thanks for your message. I have installed 3D Slicer but unable to install the extension. I'm getting below error: Ooops. Extensions can not be installed ! Extensions install directory is expected to be readable/writable/executable: /private/var/folders/nh/yztkyf6x0jj10r0n4x25gjb00000gn/T/AppTranslocation/D44571F7-FE4B-41F3-9214-161494726449/d/Slicer.app/Contents/Extensions-28062 I'm using MAC.

fedorov commented 5 years ago

It looks like you opened the installer package, but did not install the Slicer application. It looks like the application in the read-only mounted volume. See 3D Slicer installation instructions.

monjoybme commented 5 years ago

Solved this issue by dragging .dmg file into Applications folder.

monjoybme commented 5 years ago

Thanks a lot. I'm getting an image like below.

Screen Shot 2019-03-28 at 4 02 20 PM

but in your paper [page number: 13], labelling has been shown on the original image. Could you tell me how can I get results like page number 13?

fedorov commented 5 years ago

You have to import the CT study for the patient you want to see into Slicer DICOM database before loading the segmentation. If you do that, then Slicer will be able to automatically identify and load the image as you load the segmentation.

monjoybme commented 5 years ago

Thank you for your prompt reply. It's working perfectly. I read your paper, but still, I'm confused when I am trying to correlate the original image slices with the annotated images.

For an example in case of Patient ID 'LIDC-IDRI-0997', the original image folder contains almost 539 '.dcm' files (000001.dcm to 000538.dcm). But your annotated image folder contains only a few slices in which nodules are present. Can you tell me how I can be sure which annotated .dcm file corresponds to which original.dcm file?

fedorov commented 5 years ago

Although both CT and SEG are DICOM objects, they are quite different.

DICOM Computed Tomography object stores individual slices of the image as separate files.

DICOM Segmentation object is multiframe, and stores all slices in a single file.

There is no correspondence at the file level. The segmentation object references individual slices of the CT that are segmented, and the ImagePositionPatient/ImageOrientationPatient attributes can be used to establish correspondence.

If you do not need the metadata, and only concerned with the image/segmentation content, you can convert the CT and SEG series into volumetric formats, such as NIFTI or NRRD.

If you let us know more details about your goal, how you want to use the data, it would be easier to help you.

monjoybme commented 5 years ago

Thanks for your help. I want to use DICOM Computed Tomography object/slices (as an original image) and DICOM Segmentation object (as a ground truth image) to train a deep learning model which will detect the nodule and extract voxels around the nodule. This is the first time I'm working on radiology images. Hence, I couldn't understand how to handle multi-frame objects to feed into the model.

fedorov commented 5 years ago

Thank you, this helps. I am not working with deep learning myself, but for your purpose, I would consider one of the following:

Approach 1:

Approach 2 (much easier, much fewer things to learn):

For the LIDC collection, you are have pylidc custom python package developed for that specific collection, so Approach 2 will work.

For more general case, and for a growing number of other collections on TCIA, you will have no choice but use Approach 1 or its flavor. You will have to decide if learning the details and related tools for Approach 1 is worth your effort.

monjoybme commented 5 years ago

Thanks a lot.

VidhiBishnoi commented 3 years ago

Can you tell me from where I can find metadata tab-separated files for the CT images i.e the CT tables for lidc2dicom code. Iam running the code of LIDC2dicom notebook metadata verification but iam not getting from where i can get CT tables. Plz help.

fedorov commented 3 years ago

@VidhiBishnoi the procedure for extracting and querying metadata that was originally investigated in this repo is no longer recommended. I suggest you take a look at the Imaging Data Commons that has this specific collection, with all the DICOM metadata extracted into BigQuery tables, the Colab notebooks that include one exploring the content of the LIDC collection, and also this DataStudio dashboard (as discussed in this paper http://dx.doi.org/10.1002/mp.14445).

VidhiBishnoi commented 3 years ago

Thank you...Actually I'm working on lung cancer detection using machine learning and I have already downloaded the Lung CT scan dataset from LIDC. I want the nodule images and the data of normal and abnormal attributes. Afterwards I will also go for deep learning. Kindly suggest..

On Tue, Oct 27, 2020 at 6:44 PM Andrey Fedorov notifications@github.com wrote:

@VidhiBishnoi https://github.com/VidhiBishnoi the procedure for extracting and querying metadata that was originally investigated in this repo is no longer recommended. I suggest you take a look at the Imaging Data Commons http://portal.imaging.datacommons.cancer.gov/ that has this specific collection, with all the DICOM metadata extracted into BigQuery tables, the Colab notebooks https://learn.canceridc.dev/cookbook/notebooks that include one exploring the content of the LIDC collection, and also this DataStudio dashboard https://datastudio.google.com/u/0/reporting/9d9a696e-0bb9-405d-a05a-a88983a02717 (as discussed in this paper http://dx.doi.org/10.1002/mp.14445).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/QIICR/lidc2dicom/issues/4#issuecomment-717234326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKLHFC5XAVUUN76WHNOUIZDSM3BTHANCNFSM4HBGATJA .

VidhiBishnoi commented 3 years ago

Hello Sir, Can you share the video of the demo practice of deep learning on the LIDC dataset?

On Tue, Oct 27, 2020 at 10:58 PM Vidhi Bishnoi vidhibishnoi@gmail.com wrote:

Thank you...Actually I'm working on lung cancer detection using machine learning and I have already downloaded the Lung CT scan dataset from LIDC. I want the nodule images and the data of normal and abnormal attributes. Afterwards I will also go for deep learning. Kindly suggest..

On Tue, Oct 27, 2020 at 6:44 PM Andrey Fedorov notifications@github.com wrote:

@VidhiBishnoi https://github.com/VidhiBishnoi the procedure for extracting and querying metadata that was originally investigated in this repo is no longer recommended. I suggest you take a look at the Imaging Data Commons http://portal.imaging.datacommons.cancer.gov/ that has this specific collection, with all the DICOM metadata extracted into BigQuery tables, the Colab notebooks https://learn.canceridc.dev/cookbook/notebooks that include one exploring the content of the LIDC collection, and also this DataStudio dashboard https://datastudio.google.com/u/0/reporting/9d9a696e-0bb9-405d-a05a-a88983a02717 (as discussed in this paper http://dx.doi.org/10.1002/mp.14445).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/QIICR/lidc2dicom/issues/4#issuecomment-717234326, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKLHFC5XAVUUN76WHNOUIZDSM3BTHANCNFSM4HBGATJA .