cancerDHC / example-data

This repository is intended to act as a store of example data files from across the NCI Cancer Research Data Commons (CRDC) nodes in a number of formats.
MIT License
0 stars 3 forks source link

Example data for the CCDH project

nbviewer

This repository is intended to act as a store of example data files from across the NCI Cancer Research Data Commons nodes in a number of formats. Each directory represents a single dataset downloaded from a node, and contains a Jupyter Notebook documenting how they were downloaded. CCDH will use this example data to build and test the CRDC-H data model.

GDC Head and Mouth Dataset and conversion to CRDC-H

Our first example is based on a dataset of 560 cases that we downloaded from the GDC Public API. In a Jupyter Notebook, we describe how we can load this data into Python Data Classes and then export it as YAML, JSON-LD or Turtle. This is not yet intended to be a comprehensive transform of all the retrieved GDC case, but to showcase the features made available through the Python Data Classes that are part of the artifacts generated from the CRDC model. The JSON-LD and Turtle exports of the data are also available.

This example is based on CRDC-H model v1.0-pre1 of the CCDH model, which is included in this repository. We will continue to update this as the model develops, but may be out of sync with the latest version of the model until we have the time to update it.

Using Jupyter Notebooks

Many of the processes in this repository are documented in Jupyter Notebook format files, which have an .ipynb extension. These files can be viewed directly in GitHub (see CDA example for subject 09CO022 as an example). You can also run it in the Jupyter Notebook viewer (see CDA example for subject 09CO022 as an example).

If you would like to execute this file, you will need to install Jupyter Notebook (also available on Homebrew for Mac). You can then download the .ipynb file and open it in Jupyter Notebook on your computer by running:

$ jupyter notebook cptac2-subject-09CO022/CDA\ example\ for\ subject\ 09CO022.ipynb

This repository uses Poetry for dependency management. You can therefore also install Poetry, then run:

$ poetry install
$ poetry run jupyter notebook