eldariont / svim

Structural Variant Identification Method using Long Reads
GNU General Public License v3.0
155 stars 19 forks source link

SVIM - Structural variant identification using long reads

.. image:: https://img.shields.io/pypi/v/svim?style=flat :target: https://pypi.org/project/svim/

.. image:: https://img.shields.io/conda/vn/bioconda/svim?style=flat :target: https://anaconda.org/bioconda/svim

.. image:: https://img.shields.io/conda/dn/bioconda/svim?label=bioconda%20downloads&style=flat :target: https://anaconda.org/bioconda/svim

.. image:: https://img.shields.io/badge/published%20in-Bioinformatics-blue.svg :target: https://doi.org/10.1093/bioinformatics/btz041

SVIM (pronounced swim) is a structural variant caller for third-generation sequencing reads. It is able to detect and classify the following six classes of structural variation: deletions, insertions, inversions, tandem duplications, interspersed duplications and translocations (see the figure below). SVIM also estimates the genotypes of deletions, insertions, inversions and interspersed duplications. Unlike other methods, SVIM integrates information from across the genome to precisely distinguish similar events, such as tandem and interspersed duplications and simple insertions. In our experiments on simulated data and real datasets from PacBio and Nanopore sequencing machines, SVIM reached consistently better results than competing methods.

Note! To analyze haploid or diploid genome assemblies or contigs, please use our other method SVIM-asm <https://github.com/eldariont/svim-asm>_.

Background on Structural Variants and Long Reads

.. image:: https://raw.githubusercontent.com/eldariont/svim/master/docs/SVclasses.png :align: center

Structural variants (SVs) are typically defined as genomic variants larger than 50bps (e.g. deletions, duplications, inversions). Studies have shown that they affect more bases in an average genome than SNPs or small Indels. Consequently, they have a large impact on genes and regulatory regions. This is reflected in the large number of genetic disorders and other disease that are associated to SVs.

Next-generation sequencing technologies by providers such as Illumina generate short reads with high accuracy. However, they exhibit weaknesses in repetitive and low-complexity regions where SVs are particularly common. Single molecule long-read sequencing technologies from Pacific Biotechnologies and Oxford Nanopore produce reads with error rates of up to 15% but with lengths of several kbps. The high read lengths enable them to cover entire repeats and SVs which facilitates SV detection.

Installation

.. code-block:: bash

#Install via conda into a new environment (recommended): installs all dependencies including read alignment dependencies
conda create -n svim_env --channel bioconda svim

#Install via conda into existing (active) environment: installs all dependencies including read alignment dependencies
conda install --channel bioconda svim

#Install via pip (requires Python 3.6.* or newer): installs all dependencies except those necessary for read alignment (ngmlr, minimap2, samtools)
pip install svim

#Install from github (requires Python 3.6.* or newer): installs all dependencies except those necessary for read alignment (ngmlr, minimap2, samtools)
git clone https://github.com/eldariont/svim.git
cd svim
pip install .

Dependencies

Current limitations

Input

SVIM analyzes (sorted and indexed) alignment files in BAM format. Alternatively, SVIM accepts long reads in FASTA/FASTQ format (uncompressed or gzipped) or as a file list. SVIM has been successfully tested on PacBio CLR, PacBio HiFi (CCS) and Oxford Nanopore data. It has been tested on alignment files produced by the read aligners minimap2 <https://github.com/lh3/minimap2>, pbmm2 <https://github.com/PacificBiosciences/pbmm2/> and NGMLR <https://github.com/philres/ngmlr>_.

Output

SVIM produces SV calls in the Variant Call Format (VCF). The output file variants.vcf is placed into the given working directory.

Usage

For detailed information on the usage of SVIM please see our wiki <https://github.com/eldariont/svim/wiki>_.

Changelog

Contact

If you experience any problems or have suggestions please create an issue or a pull request.

Citation

Feel free to read and cite our paper in Bioinformatics: https://doi.org/10.1093/bioinformatics/btz041. Please note that since its publication in 2019 some parts of SVIM were modified (e.g. the clustering method) while others were added (e.g. the genotyping feature).

License

The project is licensed under the GNU General Public License.