This repository contains tutorial materials (for the most part, as Python notebooks) that are developed to help you learn about NCI Imaging Data Commons and utilize it in your work.
If this is the first time you hear about IDC, you may want to check out our Getting Started documentation page. Here are some highlights about what IDC has to offer:
>78 TB of data: IDC contains radiology, brightfield (H&E) and fluorescence slide microscopy images, along with image-derived data (annotations, segmentations, quantitative measurements) and accompanying clinical data
free: all of the data in IDC is publicly available: no registration, no access requests
commercial-friendly: >95% of the data in IDC is covered by the permissive CC-BY license, which allows commercial reuse (small subset of data is covered by the CC-NC license); each file in IDC is tagged with the license to make it easier for you to understand and follow the rules
cloud-based: all of the data in IDC is available from both Google and AWS public buckets: fast and free to download, no out-of-cloud egress fees
harmonized: all of the images and image-derived data in IDC is harmonized into standard DICOM representation
The tutorial notebooks are located in the notebooks, and are organized in the following folders.
getting_started
"Getting Started" python notebooks are intended to introduce the users to IDC.
idc-index
python package to programmatically search and download IDC data, visualize images and annotations, build cohorts and checking acknowledgments and liceses for the data included in your cohort.idc-index
python package and duckdb
.advanced_topics
Notebooks in this folder focus on topics that will require understanding of the basics, and aim to address more narrow use cases of IDC usage.
idc-index
and duckdb
for searching clinical data. This tutorial demonstrates more capabilities compared to the introductory clinical data usage tutorial.viewers_deployment
These notebooks can be used to deploy your own cloud-based instance of OHIF or Slim viewers using Google Firebase, which you can use to visualize analysis results you generated for IDC data, or to work with your own images. These tutorials utilize free tier of Firebase, and so there is no cost to keep the deployed viewers available in the cloud.
collectons_demos
This folders contains notebooks that demonstrate the usage of the data in the specific IDC collections. The notebooks in this folder will always have the prefix of the collection_id
they correspond to, for easier navigation.
hiplot
, an open source package for high-dimensional parameter visualization, for examining various MRI acquisition parameters for the prostate MRI images available in IDC.RMS-Mutation-Prediction
collection based on various attributes of images and expert annotations.pathomics
This folder is dedicated to the notebooks focused on the digital pathology (pathomics) applications. The use of DICOM standard is relatively new in digital pathology, and this field is being actively developed, thus a dedicated folder for this.
analysis
Demonstrations/examples of analyses of images from IDC.
labs
Here you will find an archive of the notebooks that were used in tutorials, which at times may demonstrate experimental features. By design, the notebooks presented at specific events may not be updated after the event, and are stored in this folder for archival purposes.
deprecated
IDC is an actively evolving resource. As we develop new and improved capabilities, we improve our recommended usage practices, and may deprecate notebooks that are no longer maintained and may no longer work. You will find thse in the deprecated
folder.
testing
This directory is used for the maintenance of the repository to support testing of the actively supported notebooks.
If you have any questions about the notebooks in this repository, please open a discussion thread in IDC user forum, or open the issue in this repository.