cbg-ethz / COMPASS

GNU General Public License v3.0
16 stars 9 forks source link

COMPASS

DOI: 10.5281/zenodo.10822292

COpy number and Mutations Phylogeny from Amplicon Single-cell Sequencing

This tool can be used to infer a tree of somatic events (mutations and copy number alterations) that occurred in a tumor. It is specifically designed to be used for MissionBio's Tapestri data, where a small number of amplicons (50-300) are sequenced for thousands of single-cells.

The method is described in the publication: COMPASS: joint copy number and mutation phylogeny reconstruction from amplicon single-cell sequencing data, Sollier et al., Nature Communications 2023

Quick start

git clone https://github.com/cbg-ethz/COMPASS.git
cd COMPASS
make
./COMPASS -i data/preprocessed_data_AML_Morita2020/AML-59-001 -o AML-59-001 --nchains 4 --chainlength 5000 --CNA 1
dot -Tpng -o AML-59-001_tree.png AML-59-001_tree.gv

Graphviz is required in order to plot the tree, which can be installed on Ubuntu by running sudo apt-get install graphviz

Usage

./COMPASS -i [sample_name] -o [output_name] --nchains 4 --chainlength 5000 --CNA 1 --sex female

Where:

Additional parameters can be changed if needed, although their default values should work for most cases:

In targeted sequencing, different regions have different coverages, depending on the number of amplicons targeting each region and the efficiency of the primers. By default, COMPASS will use the cells attached to the root in order to estimate the proportion of reads falling on each region in the absence of CNAs. Optionally, it is possible to provide the weights of each region with the arguments --regionweights. An example csv file is provided in data/preprocessed_data_Morita2020/region_weights_50amplicons.csv and a script to generate such a csv file is provided at Experiments/preprocessing/estimate_region_weights.py.

Use with Docker

docker run -t -v `pwd`:`pwd` -w `pwd` esollier/compass:v1.1 COMPASS -i data/preprocessed_data_AML_Morita2020/AML-59-001 -o AML-59-001 --nchains 4 --chainlength 5000 --CNA 1

Input

COMPASS takes as input 2 files:

The data directory contains preprocessed datasets. The Experiments/preprocessing directory contains scripts used to preprocess the loom files generated by the Tapestri pipeline, as well as workflows used to run simulations on synthetic data.

Output

If [output_name] ends with .gv , COMPASS will only output the tree in graphviz format, which can then be plotted. Otherwise, COMPASS will produce as output:

The data/output_example directory contains an example output.