getzlab / MutSig2CV

MutSig2CV from Lawrence et al. 2014
Other
30 stars 8 forks source link

MutSig2CV

MutSig2CV (Lawrence et al., 2014).

Overview

MutSig2CV analyzes somatic point mutations discovered in DNA sequencing, identifying genes mutated more often than expected by chance given inferred background mutation processes. MutSig2CV consists of three independent statistical tests, described briefly below:

For detailed descriptions of the algorithms employed in the MutSig2CV suite for each of these tests, please visit https://www.broadinstitute.org/cancer/cga/mutsig

Installing

MutSig is implemented in MATLAB. If you have a MATLAB installation and wish to run MutSig interactively on the MATLAB console, skip to the Running section below. If you do not have MATLAB installed, or do not wish to run interactively, MutSig can be run as a standalone executable. The standalone executable is available for 64 bit Linux systems only, and requires that the MATLAB R2013a runtime be installed. You can download and install the runtime environment from here. Runtime installation instructions can be found here.

Once the runtime is successfully installed, you must add it to your LD_LIBRARY_PATH.

MCRROOT=<path to runtime you specified when installing>
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/bin/glnxa64/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/sys/java/jre/glnxa64/jre/lib/amd64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/sys/java/jre/glnxa64/jre/lib/amd64/server
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/sys/java/jre/glnxa64/jre/lib/amd64/native_threads
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/sys/os/glnxa64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/bin/glnxa64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/runtime/glnxa64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$MCRROOT/lib

MutSig requires ~3 GB of reference files. Since these files are too large to include in a GitHub repository, they are hosted elsewhere. Please download them from here, and copy the reference directory into this folder.

Running

To run on the MATLAB console, start MATLAB in this directory, and run:

MutSig2CV(<path to mutations>, <path to output directory>, [params file])

To run the standalone application, cd to this directory, and run:

bin/MutSig2CV <path to mutations> <path to output directory> [params file]

MutSig looks for its reference files relative to this directory, so it is essential it is run here.

Each input is decribed below.

Description of inputs

Mutation Input Format

As input, MutSig takes a tab-delimited file with each line annotating a single mutation in a single patient. Columns can be in any order, with names and formats as follows. To provide maximal input flexibility, MutSig accepts synonyms for each column name. Column names are case sensitive.

If your input mutation data are missing gene/ref_allele/type/classification fields, we recommend annotating using Oncotator, which will produce a MutSig-ready MAF from a variety of input data. Oncotator is available here: https://www.broadinstitute.org/cancer/cga/oncotator

Outputs

A MutSig run outputs several files:

sig_genes.txt

A tab-delimited file containing all genes considered for analysis, sorted by p-/q-values. Columns are as follows:

final_analysis_set.maf

MutSig does not necessarily consider all mutations for analysis; for instance, mutations in genes determined to be poorly covered, mutations determined to belong to duplicate patients, or deep IGR mutations will be discarded. final_analysis_set.maf contains only those mutations actually used for significance analysis. Note that this file preserves the original input MAF's columns but also contains columns used internally by MutSig.

patient_counts_and_rates.txt

A tab-delimited file summarizing each patient's mutation counts, one patient per line. For each patient, fields are as follows:

mutcategs.txt, mutcateg_discovery.txt

MutSig accounts for mutation rate heterogeneity across trinucleotide contexts when calculating BMRs. To increase statistical power, it clusters the 96 base substitutions+trinucleotide contexts into k mutually exclusive categories via entropy minimization with default k = 5. mutcategs.txt and mutcateg_discovery.txt contain the definitions of each category.

per_gene.mutation_counts.txt

Tab-delimited file summarizing each gene's mutation counts on a per-patient basis, one gene per line. For each gene, fields are as follows:

and then one column per patient, displaying the total number of mutations in this gene for that patient.

sample_sig_gene_table.txt

The same as per_gene.mutation_counts.txt, but only for genes with q-value <= 0.1.

results.mat

MATLAB binary version of sig_genes.txt, containing much more detail. It can be loaded into MATLAB by typing

  load('results.mat', 'G')

MutSig Configuration

MutSig's algorithm and run parameters are configured via a two column, tab- delimited text file. Here is a list of all available parameters, possible options, and default values. A sample parameters file (set to defaults) can be found in test/input/params.txt.