MIC-DKFZ / LIDC-IDRI-processing

Scripts for the preprocessing of LIDC-IDRI data
MIT License
75 stars 19 forks source link

DOI

LIDC Data processing scripts

Copyright © German Cancer Research Center (DKFZ), Division of Medical Image Computing (MIC).

The scripts within this repository can be used to convert the LIDC-IDRI data. After calling this script, the image and segmentation data is available in nifti/nrrd format and the nodule characteristics are available in a single comma separated (csv) file.

If you are using these scripts for your publication, please cite as

Michael Goetz, "MIC-DKFZ/LIDC-IDRI-processing: Release 1.0.1", DOI: 10.5281/zenodo.2249217

Requirements

The scripts uses some standard python libraries (glob, os, subprocess, numpy, and xml), the python library SimpleITK. Additionally, some command line tools from MITK are used. They can be either obtained by building MITK and enabling the classification module or by installing MITK Phenotyping which contains all necessary command line tools.

Basic Usage

Following input paths needs to be defined:

Output / Result

The output created of this script consists of Nrrd-Files containing a whole DICOM Series (i.e. an complete 3D CT image), Nifti (.nii.gz) files of the Nodule-Segmentations (3D), Nrrd and Planar Figures (.pf) containing slice-wise segmentations of Nodules.

The data are stored in subfolders, indicating the . The 5 sign matches the numerical part of the Patient ID that is used in the LIDC_IDRI Dicom folder. However, since some patients come with more than one CT image, the is appended a single letter, so that each CT scan has an unique . For example, the folder "LIDC_IDRI-0129" may contain two CT images, which will then have the "0129a" and "0129b".

There are up to four reader sessions given for each patient and image. is a 1-sign number indicating the rang of expert FOR THE GIVEN IMAGE. According to the corresponding publication, each session was done by one of 12 experts. However, it is not possible to ensure that two images where annotated by the same expert. Therefore, two images might be annotated by different experts even if they have the same

Each combination of Nodule and Expert has an unique 8-digit , for example 0000358. This ID is unique between all created segmentations of nodules and experts. This means that two segmentations of the same Nodule will have different s. In contrast to this, the 8-digit is the same for all segmentations of the same nodule. It is defined as the minimum of all segmentations of a given Nodule.

The is an id, which is unique within a set of Planar Figures or 2D Segmentations of a single nodule. It is used to differenciate multiple planes of segmentations of the same object.

Based on these definitions, the following files are created:

In addition, the characteristic of the nodules are saved in the file specified in path_to_characteristics and errors occuring during the whole process are recorded in path_to_error_file

Limitations

The script had been developed using windows. It should be possible to execute it using linux, however this had never been tested. Problems may be caused by the subprocess calls (calling the executables of MITK Phenotyping).

Also, the script had been developed for own research and is not extensivly tested. It is possible that i faulty included some limitations.

I've deloped this script when there were no DICOM Seg-files for the LIDC_IDRI available online. So this script relys on the XML-description, which might not be the best solution. Feel free to extend / write a new solution which makes use of the now available DICOM Seg objects.

Further questions

If you have suggestions or questions, you can reach the author (Michael Goetz) at m.goetz@dkfz-heidelberg.de

Licence

Copyright (c) 2003-2019 German Cancer Research Center, Division of Medical Image Computing All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.