galantelab / sideRETRO

A pipeline for detecting Somatic Insertion of DE novo RETROcopies
https://sideretro.readthedocs.io
GNU General Public License v3.0
7 stars 5 forks source link
cnv genotype mobile-elements next-generation-sequencing polimorphism pseudogenes retrocopy wes wgs

sideRETRO

A pipeline for detecting Somatic Insertion of DE novo RETROcopies

CodeQL GitHub tag

sideRETRO is a bioinformatic tool devoted for the detection of somatic retrocopy insertion, also known as retroCNV, in whole genome and whole exome sequencing data (WGS, WES). The program has been written from scratch in C, and uses HTSlib and SQLite3 libraries, in order to manage SAM/BAM/CRAM reading and data analysis.

For full documentation, please visit https://sideretro.readthedocs.io.

Features

When detecting retrocopies, sideRETRO can annotate several other features related to each event:

Getting Started

Installation

The project depends on Meson build system and Ninja to manage configuration and compilation process. They can be obtained using package manager or from source. For example, using Ubuntu distribution:

$ sudo apt-get install python3 \
                       python3-pip \
                       python3-setuptools \
                       python3-wheel \
                       ninja-build

and then:

$ pip3 install --user meson

(or: $ sudo apt install meson)

Finally, clone this repository:

$ git clone https://github.com/galantelab/sideRETRO.git

Inside sideRETRO directory, run:

$ meson build && ninja -C build

You can find sider executable inside build/src. Optionally, install to system directories with:

$ sudo ninja -C build install

Usage

sideRETRO compiles to an executable called sider, which has three subcommands: process-sample, merge-call and make-vcf. The process-sample subcommand processes a list of SAM/BAM/CRAM files, and captures abnormal reads that must be related to an event of retrocopy. All those data is saved to a SQLite3 database and then we come to the second step merge-call, which processes the database and annotates all the retrocopies found. Finally we can run the subcommand make-vcf and generate a file (in VCF format) with retrocopies and further information about them.

# List of BAM files
$ cat 'my-bam-list.txt'
/path/to/file1.bam
/path/to/file2.bam
/path/to/file3.bam

# Run process-sample step
$ sider process-sample \
    --annotation-file='my-annotation.gtf' \
    --input-file='my-bam-list.txt'

$ ls -1
my-genome.fa
my-annotation.gtf
my-bam-list.txt
out.db

# Run merge-call step
$ sider merge-call --in-place out.db

# Run make-vcf step
$ sider make-vcf \
    --reference-file='my-genome.fa' out.db

Citation

If sideRETRO was somehow useful in your research, please cite it:

@article{10.1093/bioinformatics/btaa689,
  author = {Miller, Thiago L A and Orpinelli, Fernanda and Buzzo, José Leonel L and Galante, Pedro A F},
  title = "{sideRETRO: a pipeline for identifying somatic and polymorphic insertions of processed pseudogenes or retrocopies}",
  journal = {Bioinformatics},
  year = {2020},
  month = {07},
  issn = {1367-4803},
  doi = {10.1093/bioinformatics/btaa689},
  url = {https://doi.org/10.1093/bioinformatics/btaa689},
  note = {btaa689},
}

License

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007