FunGeST / Palimpsest

An R package for studying mutational signatures and structural variant signatures along clonal evolution in cancer.
69 stars 19 forks source link
bioinformatics bioinformatics-analysis cancer-genomics cancer-research clonality driver-events genomics mutational-signatures nmf nmf-extraction r signature-extraction somatic-mutations structural-variant-signatures structural-variation tumor-evolution tumor-heterogeneity visualization

Palimpsest

Cancer genomes are altered by various mutational processes and, like palimpsests, bear the signatures of these successive processes. The Palimpsest R package provides a complete workflow for the characterisation and visualisation of mutational signatures, including their evolution along tumour development. The package includes a wide range of functions for extracting single base substitution (SBS), double base substitution (DBS) and indel mutational signatures as well as structural variant (SV) signatures. Palimpsest estimates the probability of each mutation being due to each signature, which allows the clonality of each alteration to be calculated, and the mechanism at the origin of each driver event to be predicted. In short, Palimpsest is an easy-to-use toolset for the reconstruction of the natural history of tumours using whole exome or whole genome sequencing data.

Installation

Install from the GitHub repository using devtools:

install.packages("devtools")
devtools::install_github("FunGeST/Palimpsest")

Dependencies

To add indel mutation categories we use a python script that is embedded in the R function annotate_VCF(). For this to work the function must be run in a Unix environment (i.e. Mac or Linux) with python 2.7 installed. The other aspects of the annotate_VCF() function, and indeed all other functions, work on a Windows operating system. The indel aspect of this function also requires you to have a FASTA file compatable with the input VCF genome (including position and chromosome names) accessible in your local environment. If you only wish to work on SBS/DBS/SV signatures you can skip this step.

The R package bedr is required to perform structural variant signature analysis. The bedr API gives access to "BEDTools" and offers additional utilities for genomic region processing. To gain the functionality of bedr package you will need to have the BEDTools program installed and in your default PATH.

Input files

For the analysis of somatic mutations Palimpsest requires one mandatory input file -- a mutational catalogue file (vcf) describing somatic mutations in the tumour series. For the analysis of the clonality of somatic mutations (optional), a further two files are required -- a copy number alteration file (cna_data) providing genome-wide absolute copy number estimates, and a minimal sample annotation file (annot_data) indicating gender and tumour purity.

The input files should have the following columns (the header is required, but the order of the columns can change). Example input files are provided with the package.

1]. VCF: somatic mutation data

Optional:

2]. cna_data: copy number alteration data

3]. annot_data: sample annotation data

4]. sv_data: structural variant data

Running Palimpsest

Reference

Shinde, J. et al. (2018) Palimpsest: an R package for studying mutational and structural variant signatures along clonal evolution in cancer. Bioinformatics.

Figure 1. (A) Workflow illustrating a typical analysis with Palimpsest. Taking as input somatic mutations, copy-number alterations (CNAs) and structural variants, the package classifies variants as clonal and subclonal, extracts mutational and structural variant signatures separately in early clonal and late subclonal events, and estimates the probability of each alteration being due to each process. The timing of chromosome duplications is also estimated from the ratio of duplicated/non-duplicated mutations to reconstruct the complete natural history of the tumour. (B) Example of output representing, for one tumour, the number of clonal and subclonal mutations, their distribution per mutation signature, the driver alterations (colored according to the most likely causal mutational process) and CNA timing.

License

Copyright (C) 2019 Benedict Monteiro & Jayendra Shinde

Palimpsest is a free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.