RCK - is a method for Reconstruction of clone- and haplotype-specific Cancer Karyotypes from tumor mixtures, distributed both as a standalone software package and as a Python library under the MIT licence.
RCK has been initially designed and developed by Sergey Aganezov in the group of prof. Ben Raphael at Princeton University (group site). Current development of RCK is continued by Sergey Aganezov in the group of prof. Michael Schatz at Johns Hopkins University (group site).
The full description of the algorithm and its application on published cancer datasets are described in:
Sergey Aganezov and Benjamin J. Raphael, 2019
RCK infers clone- and haplotype-speicifc cancer genome karyotypes from tumor mixtures.
RCK assumes that:
RCK uses a Diploid Interval Adjacency Graph to represent all possible segments and transitions between them (across all clones and the reference). RCK then solves an optimization problem of inferring clone- and haplotype-specific karyotypes (i.e., finding clone-specific edge multiplicity functions in the constructed DIAG) as an MILP program. Several constraints are taken into consideration (some of which are listed below) during the inference:
We note, that in contrast to some other cancer karyotype inference methods, RCK model has several advantages, that all work in q unifying computation framework and some/all of which differentiate RCK from other methods:
n
derived genomesRCK shall work on latest macOS, and main Linux distribution. RCK is implemented in Python and designed to work with Python 3.7+. We highly recommend creating an independent python virtual environment for RCK usage.
RCK itself can be installed in three different ways:
conda install -c aganezov rck
pip install rck
python setup.py install
RCK requires an ILP solver installed on the system, as well as python bindings for it. Currently only Gurobi ILP solver is supported.
For more details about installation please refer to the installation documentation.
The minimum input for RCK is comprised of two parts:
Additional input can contain:
RCK expects the input data to be in a (C/T)SV (Coma/Tab Separated Values) format. We provide a set of utility tools to convert input data obtained from a lot of state-of-the-atr methods outputs into the RCK suitable format.
Obtaining unlabeled (i.e., without allele-information) novel adjacencies (aka Structural Variants) is not a part of the RCK workflow, as there exist a lot of tools for obtaining those.
We provide a rck-adj-x2rck
utility to convert output from output format of SV detection tools to the RCK suitable format.
We currently support converting the output of the following 3rd-party SV detection tools:
For more information about adjacencies, formats, converting, reciprocality, etc, please refer to adjacencies documentation
Obtaining clone- and allele-specific segment copy numbers is not a part of the RCK workflow, as there exist a lof of tools for obtaining those.
We provide a rck-scnt-x2rck
utility to convert output from output format of other tools that infer clone- and allele-specific segment copy numbers to the RCK suitable format.
We currently support converting the output of the following 3rd-party tools:
For the most cases the cancer sample of interest is initially represented via a set cancer.sr.fastq
of reads obtained via a sequencer.
Additionally, a sequenced reads normal.sr.fastq
from a matching normal sample need to be available.
Most often case of analysis consists of having a standard Illumina paired-end sequenced reads for both the tumor and the matching normal.
Increasingly 3rd-generation sequencing technologies are being utilized in cancer analysis.
Let us assume that there may optionally be a set cancer.lr.fastq
of reads for the cancer sample in question obtained via 3rd-generation sequencing technology.
cancer.sr.fastq
and normal.sr.fastq
for cancer and a matching normal samples to obtain cancer.sr.bam
and normal.sr.bam
cancer.lr.fastq
to obtain cancer.lr.bam
cancer.sr.fastq
to obtain a novel adjacencies VCF file cancer.sr.vcf
cancer.vcf
to the RCK
input format via rck-adj-x2rck x cancer.vcf -o input.rck.adj.tsv
, where x
stands for the novel adjacency inference tool.
Please, see adjacencies docs for list of supported tools and more detailed instructions on comparison.CN.data
(generic name of the tool-specific result)CN.data
into RCK
format via rck-scnt-x2rck x CN-data -o input.rck.scnt.tsv
, where x
stands for copy number inference tool.
Please, see segments docs for link to specific methods, as well as details on how to run conversion.RCK
We provide the the rck
tool to run the main RCK algorithm for clone- and haplotype specific cancer karyotypes reconstruction.
With the minimum input for RCK the following is the example of running RCK:
rck --scnt input.rck.scnt.tsv --adjacecnies input.rck.adj.tsv
where:
--scnt
corresponds to the clone- and allele-specific segments copy number input--adjacencies
corresponds to the unlabeled novel adjacencies inputAdditionally one can specify the --workdir
working directory, where the input, preprocessing, and the output will be stored.
For more on the rck
command usage please refer to usage documentation.
Here is the description of the results produced by rck
main method for cancer karyotype reconstruction.
For results on segment/adjacency conversion/processing, please refer to respective segment/adjacency documentations.
RCK's cancer karyotype reconstruction is stored in the output
subdirectory in the working directory (the --workdir
).
The following two files depict the inferred clone- and haplotype-specific karyotypes:
rck.scnt.tsv
- clone- and haplotype-specific segments copy numbers;rck.acnt.tsv
- clone- and haplotype-specific adjacencies copy numbers;For information about the format of the inferred clone- and haplotype-specific copy numbers on segments/adjacencies please refer to segment/adjacency documentations
Results in the original manuscript can be found in the dedicated Github repository.
When using RCK's cancer karyotype reconstruction algorithm or any of RCK's utilities, please cite the following paper:
Sergey Aganezov and Benjamin J. Raphael, 2019
If you experience any issues with RCK installation, usage, or results or want to see RCK enhanced in any way, shape or form, please create an issue on RCK issue tracker. Please, make sure to specify the RCK's, Python's, and Gurobi's versions in question, and, if possible, provide (minimized) data, on which the issue(s) occur(s).
If you want to discuss any avenues of collaboration, custom RCK applications, etc, please contact Sergey Aganezov at aganezov(at)jhu.edu or sergeyaganezovjr(at)gmail.com