sideRETRO is a bioinformatic tool devoted for the detection of somatic retrocopy insertion, also known as retroCNV, in whole genome and whole exome sequencing data (WGS, WES). The program has been written from scratch in C, and uses HTSlib and SQLite3 libraries, in order to manage SAM/BAM/CRAM reading and data analysis.
For full documentation, please visit https://sideretro.readthedocs.io.
When detecting retrocopies, sideRETRO can annotate several other features related to each event:
Parental gene
The gene which underwent retrotransposition process.
Genomic position
The genome coordinate where occurred the retrocopy integration event (chromosome:start-end). It includes the insertion point (the expected exact point of each retrocopy insertion).
Strandness
Detects the orientation of the insertion (+/-). It takes into account the orientation of insertion, whether in the leading (+) or lagging (-) DNA strand.
Genomic context
The retrocopy integration site context: If the retrotransposition event occurred at an intergenic or intragenic region - the latter can be splitted into exonic and intronic according to the host gene.
Genotype
When multiple individuals (genomes) are analyzed, sideRETRO discriminates events found in each one. That way, it is possible to distinguish whether an event is exclusive or shared among the cohort analyzed.
Haplotype
Our tool provides information about the ploidy of the event, i.e., whether it occurs in one or both homologous chromosomes (homozygous or heterozygous).
The project depends on Meson build system and Ninja to manage configuration and compilation process. They can be obtained using package manager or from source. For example, using Ubuntu distribution:
$ sudo apt-get install python3 \
python3-pip \
python3-setuptools \
python3-wheel \
ninja-build
and then:
$ pip3 install --user meson
(or: $ sudo apt install meson
)
Finally, clone this repository:
$ git clone https://github.com/galantelab/sideRETRO.git
Inside sideRETRO directory, run:
$ meson build && ninja -C build
You can find sider
executable inside build/src
. Optionally, install to system directories with:
$ sudo ninja -C build install
sideRETRO compiles to an executable called sider
, which has three subcommands: process-sample
, merge-call
and make-vcf
. The process-sample
subcommand processes a list of SAM/BAM/CRAM files, and captures abnormal reads
that must be related to an event of retrocopy. All those data is saved to a SQLite3 database and then we come
to the second step merge-call
, which processes the database and annotates all the retrocopies found. Finally we
can run the subcommand make-vcf
and generate a file (in VCF format) with retrocopies and further information
about them.
# List of BAM files
$ cat 'my-bam-list.txt'
/path/to/file1.bam
/path/to/file2.bam
/path/to/file3.bam
# Run process-sample step
$ sider process-sample \
--annotation-file='my-annotation.gtf' \
--input-file='my-bam-list.txt'
$ ls -1
my-genome.fa
my-annotation.gtf
my-bam-list.txt
out.db
# Run merge-call step
$ sider merge-call --in-place out.db
# Run make-vcf step
$ sider make-vcf \
--reference-file='my-genome.fa' out.db
If sideRETRO was somehow useful in your research, please cite it:
@article{10.1093/bioinformatics/btaa689,
author = {Miller, Thiago L A and Orpinelli, Fernanda and Buzzo, José Leonel L and Galante, Pedro A F},
title = "{sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies}",
journal = {Bioinformatics},
year = {2020},
month = {07},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btaa689},
url = {https://doi.org/10.1093/bioinformatics/btaa689},
note = {btaa689},
}
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007