CityUHK-CompBio / DeepCC

DeepCC: a novel deep learning-based framework for cancer molecular subtype classification
https://CityUHK-CompBio.github.io/DeepCC/
MIT License
20 stars 16 forks source link

DeepCC

DeepCC: a novel deep learning-based framework for cancer molecular subtype classification

(See Gao, et al. Oncogenesis, 2019)

Updata

Dependencies

The current version of DeepCC dependes on keras framework, with CPU and GPU supporting. Please install keras in R first accroding to the instruction and make sure that keras able to run correctly.

Installation

You can install DeepCC from GitHub directly using devtools.

install.packages("devtools")
devtools::install_github("CityUHK-CompBio/DeepCC")

or

install.packages("remote")
remotes::install_github("CityUHK-CompBio/DeepCC")

Quick start

As a case study, you can obtain well organized colorectal cancer data from CRCSC's repository on Synapse.

DeepCC only need two input for start.

Now assume you have the gene expression profiles in eps and training labels in labels.

library(DeepCC)
library(keras)

# get functional spectra from gene expression profiles and the geneSets will use MSigDBv7 by default. Add parameter `geneSets = "MSigDBvx"` (x = 5/6/7) or `geneSets = MSigDBr` (MSigDBr attained from R package `msigdbr` with function get_msigdbr() in DeepCC) for different version of MSigDB.
fs <- getFunctionalSpectra(eps)

# train DeepCC model
deepcc_model <- train_DeepCC_model(fs, labels)

# obtain deep features 
df <- get_DeepCC_features(deepcc_model, fs)

After training, now you can use your DeepCC model to classify new sample(s). DeepCC can classify samples in a data set, as well as individual samples. The input data should be in the same format as above gene expression profile(s).

# classify new data set use trained DeepCC model
# for a batch of samples
new_fs <- getFunctionalSpectra(new_eps)
pred_labels <- get_DeepCC_label(deepcc_model, new_fs)

# for a given single sample, you have to provide a reference expression profile.
new_fs_single <- getFunctionalSpectrum(new_ep, refExp = "COADREAD")
pred_label <- get_DeepCC_label(deepcc_model, new_fs_single)

Note: You can generate customized reference expression profile from your previous data or public data, which is the same(similar) cancer type and platform. Alternatively, you can use pre-defined reference in DeepCC by passing the cancer type (in the format of TCGA cancer types)

Additional tools

baidunetdisk

Link: https://pan.baidu.com/s/10ogQPsNa4SksJadT8_ceHA      Password: ugeg

To load CRC DeepCC model in R:

load_DeepCC_model <- function(prefix){
  load(file = paste0(prefix, ".RData"))
  classifer <- keras::load_model_hdf5(filepath =paste0(prefix, ".hdf5"))
  list(classifier = classifer, levels = levels)
}

# for CRC model trained with TCGA data
CRC_TCGA <- load_DeepCC_model("CRC_TCGA")

DeepCC_online models

All DeepCC models used in DeepCC_online are available in github:

# use git in terminal
git clone git@github.com:zero19970/deepcc_model.git

List of functional gene DeepCC

By default, DeepCC will use MSigDB v7 (22, 596 gene sets) to generate functional spectra. You can also use MSigDB v5.0 (10, 348 gene sets) and MSigDB v7.0 (17, 779 gene sets).

# get MSigDBr from R package `msigdbr`
library(msigdbr)
MSigDBr <- get_msigdbr()

Pre-defined reference

In DeepCC we prepared average expression profiles of each cancer types in TCGA project as reference. To use them, just use the TCGA identifier (COADREAD, BRCA, OV, etc.) to indicate the cancer type.

Note: if your single sample is microarray data, we strongly suggest you turn the parameter inverseRescale on, since TCGA is RNA-Seq data, compared with microarray. inverseRescale can overcome this distribution differently a little bit, but it's better to generate your customized reference.