dannlbol / mcibersort_scripts

scripts used for Grabovska et al methylCIBERSORT analysis
GNU General Public License v3.0
4 stars 3 forks source link

Support functions for MethylCIBERSORT

Custom scripts used within "Pediatric Pan-CNS Tumor Analysis of Immune-cell Infiltration Identifies Correlates of Antitumor Immunity" by Grabovska et al. (2020)


The functions provided serve two main purposes:



This function mimics the functionality of FeatureSelect.V4() from MethylCIBERSORT 0.2.1 written by Dr Ankur Chakravarthy

Default function call:

feature.select.new(MaxDMPs = 100, deltaBeta = 0.2, useM = FALSE, CellLines.matrix = NULL, export = TRUE, export.fit = TRUE, export.cpg = TRUE, sigName = "methylCibersort", Stroma.matrix = NULL, Phenotype.stroma = NULL, FDR = 0.01, silent = TRUE)


MaxDMPs: maximum differentially methylated probes to use, takes n/2 from top & n/2 from bottom

deltaBeta: cutoff for the minimum difference between pairwise groups by delta-beta

useM: specify whether conversion to M-values should be done before carrying out feature selection

CellLines.matrix: input matrix for cell line 'cancer' data

export: save a table of signature results

export.fit: whether to export the limma fit object during the feature selection

export.cpg: whether to export a table of the CpG probes selected alongside the name of the population which they were selected against

sigName: name appended to start of filename

Stroma.matrix: matrix of betas for populations

Phenotype.stroma: pheno that corresponds to Stroma.matrix

FDR: FDR cutoff

silent: run function without returning output

Typical outputs:

Note on functionality

The typical function workflow is shown below, various set and optional arguments are shown in bold:


This function creates a matrix of proportions for a given set of data populations and generates weighted means based on those proportions for a given matrix of values, typically methylation array beta-values.

Default function call:

make.synth.mix(input.data = NULL, pop.factor = NULL, pop.rows = 100, output.dir = getwd(), output.name = gsub("-", "_", Sys.Date()), n.cores = 1)


input.data: a matrix of data, typically beta-values; rows = features (probes), columns = observations (samples)

pop.factor: a factor with levels describing the populations in input.data columns

pop.rows: numeric specifying how many rows of proportions to generate for each population

output.dir: location to save the resulting files

output.name: name to append to the resulting filenames

n.cores: set to >1 to run on multiple cores in parallel using parallel::mcapply()

Typical outputs:

Note on functionality:

The function populates the proportion table by column, with each population being assigned a set sequence of probabilities. A typical example for 3 populations, 6 rows per population:

System Requirements

Hardware Requirements

Functions provided here are compatible with any standard computer with enough RAM to support the in-memory operations. The functions are supported on any operating system which supports MethylCIBERSORT 0.2.1

Software Requirements

Functions provided import and depend on a number of R packages. Functionality has been tested on R 3.5.3 with Ubuntu 16.04.5 LTS

R Dependencies

MethylCIBERSORT 0.2.1 was obtained by correspondence from Dr Ankur Chakravarthy.


Usage Guide:

Example Outputs

Example outputs generated from running the below usage examples included for comparison within Example Outputs

The final signature generated as part of our study is available to download - OPT2_0.2_200_SigEdit.txt

Feature selection example:

idx <- which(FlowSorted.Blood.450k$CellType %in% c("Bcell","CD4T","CD8T","NK"))
bvals <- getBeta(FlowSorted.Blood.450k[, idx])
getwd() ## outs stored in working directory
a <- Sys.time()
feature.select.new(Stroma.matrix = bvals,
                   Phenotype.stroma = as.factor(FlowSorted.Blood.450k$CellType[idx]),
                   sigName = "test")
b <- Sys.time()
# Time difference of 1.55212 mins

Generating synthetic mixtures example:

idx <- which(FlowSorted.Blood.450k$CellType %in% c("Bcell","CD4T","CD8T","NK"))
bvals <- getBeta(FlowSorted.Blood.450k[, idx])
b.sig <- read.delim("./test_0.2_100_Signature.txt", row.names = 1, header = TRUE)
bvals <- bvals[rownames(b.sig), ]
a <- Sys.time()
make.synth.mix(input.data = bvals,
               pop.factor = as.factor(FlowSorted.Blood.450k$CellType[idx]),
               pop.rows = 10,
               n.cores = 1,
               output.name = "test")
b <- Sys.time()
## Time difference of 1.76159 secs

Example CIBERSORT analysis

a <- Sys.time()
source("./CIBERSORT.R") ## code available under license upon request from https://cibersort.stanford.edu/
results <- CIBERSORT(sig_matrix = "./test_0.2_100_Signature.txt",
                     mixture_file = "./test_synth_mix.txt",
                     perm = 1000,
                     QN = FALSE,
                     absolute = FALSE,
                     abs_method = 'sig.score')
b <- Sys.time()
## Time difference of 10.40594 mins

## the above is not a real world example however we provide here the full signature generated and used in our study and we typically run something like this...
# results <-CIBERSORT(sig_matrix = "./OPT2_0.2_200_SigEdit.txt", ## the signature from our study
#                    mixture_file = "a_mixture_file_from_CNS_tumours.txt", ## i.e. make your own beta matrix
#                    perm = 1000,
#                    QN = F,
#                    absolute = F,
#                    abs_method = 'sig.score')