dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
444 stars 137 forks source link
cancer-genomics delly delly-users genomic germline rearrangement structural-variation sv-discovery svs tumor

Delly

install with bioconda Anaconda-Server Badge C/C++ CI Docker CI GitHub license GitHub Releases

Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read and long-read massively parallel sequencing data. It uses paired-ends, split-reads and read-depth to sensitively and accurately delineate genomic rearrangements throughout the genome.

Installing Delly

Delly is available as a statically linked binary, a singularity container (SIF file), a docker container or via Bioconda. You can also build Delly from source using a recursive clone and make.

git clone --recursive https://github.com/dellytools/delly.git

cd delly/

make all

There is a Delly discussion group delly-users for usage and installation questions.

Delly multi-threading mode

Delly supports parallel computing using the OpenMP API (www.openmp.org).

make PARALLEL=1 src/delly

You can set the number of threads using the environment variable OMP_NUM_THREADS.

export OMP_NUM_THREADS=2

Delly primarily parallelizes on the sample level. Hence, OMP_NUM_THREADS should be always smaller or equal to the number of input samples.

Running Delly

Delly needs a sorted, indexed and duplicate marked bam file for every input sample. An indexed reference genome is required to identify split-reads. Common workflows for germline and somatic SV calling are outlined below.

delly call -g hg38.fa input.bam > delly.vcf

You can also specify an output file in BCF format.

delly call -o delly.bcf -g hg38.fa input.bam

bcftools view delly.bcf > delly.vcf

Example

A small example is included for short-read, long-read and copy-number variant calling.

delly call -g example/ref.fa -o sr.bcf example/sr.bam

delly lr -g example/ref.fa -o lr.bcf example/lr.bam

delly cnv -g example/ref.fa -m example/map.fa.gz -c out.cov.gz -o cnv.bcf example/sr.bam

More in-depth tutorials for SV calling are available here:

Somatic SV calling

delly call -x hg38.excl -o t1.bcf -g hg38.fa tumor1.bam control1.bam

delly filter -f somatic -o t1.pre.bcf -s samples.tsv t1.bcf

delly call -g hg38.fa -v t1.pre.bcf -o geno.bcf -x hg38.excl tumor1.bam control1.bam ... controlN.bam

delly filter -f somatic -o t1.somatic.bcf -s samples.tsv geno.bcf

Germline SV calling

delly call -g hg38.fa -o s1.bcf -x hg38.excl sample1.bam

delly merge -o sites.bcf s1.bcf s2.bcf ... sN.bcf

delly call -g hg38.fa -v sites.bcf -o s1.geno.bcf -x hg38.excl s1.bam

delly call -g hg38.fa -v sites.bcf -o sN.geno.bcf -x hg38.excl sN.bam

bcftools merge -m id -O b -o merged.bcf s1.geno.bcf s2.geno.bcf ... sN.geno.bcf

delly filter -f germline -o germline.bcf merged.bcf

Delly for long reads from PacBio or ONT

Delly also supports long-reads for SV discovery.

delly lr -y ont -o delly.bcf -g hg38.fa input.bam

delly lr -y pb -o delly.bcf -g hg38.fa input.bam

Alternate alignments for genome graphs

Instead of providing only one input alignment, delly supports now multiple alternate alignments on different linear reference genomes using minimap2 or pan-genome graphs using minigraph.

minimap2 -ax map-pb -L chm13.fa sample.fq.gz
minigraph --vc -cx lr pangenome.gfa.gz sample.fq.gz

If the above alignment files are then stored as sample.chm13.bam and sample.gaf.gz you can use a simple tab-delimited config file for all alternate alignments with delly.

cat align.config

sample.chm13.bam   chm13.fa
sample.gaf.gz   pangenome.gfa.gz

delly lr -y pb -o delly.bcf -g hg38.fa -l align.config sample.hg38.bam

Structural variants are still reported with respect to GRCh38 coordinates but the output will only contain SVs that are not present in any of the alternate alignments. For pangenome graphs you may want to try the augmented graph from this study. Please note that this graph contains only SVs greater 50bp so you need to filter the above delly output to match the size range using bcftools.

bcftools view -i '(QUAL>=300) && ( ((SVTYPE=="INS") && (INFO/SVLEN>50)) || (SVTYPE="BND") || ((INFO/END - POS)>50) )' delly.bcf

Read-depth profiles and copy-number variant calling

You can generate read-depth profiles with delly. This requires a mappability map which can be downloaded here:

Mappability Maps

The command to count reads in 10kbp mappable windows and normalize the coverage is:

delly cnv -a -g hg38.fa -m hg38.map -c out.cov.gz -o out.bcf input.bam

The output file out.cov.gz can be plotted using R to generate normalized copy-number profiles and segment the read-depth information:

Rscript R/rd.R out.cov.gz

Instead of segmenting the read-depth information, you can also visualize the CNV calls.

bcftools query -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" out.bcf > seg.bed

Rscript R/rd.R out.cov.gz seg.bed

With -s you can output a statistics file with GC bias information.

delly cnv -g hg38.fa -m hg38.map -c out.cov.gz -o out.bcf -s stats.gz input.bam

zcat stats.gz | grep "^GC" > gc.bias.tsv

Rscript R/gcbias.R gc.bias.tsv

Germline CNV calling

Delly uses GC and mappability fragment correction to call CNVs. This requires a mappability map.

delly cnv -o c1.bcf -g hg38.fa -m hg38.map -l delly.sv.bcf input.bam

delly merge -e -p -o sites.bcf -m 1000 -n 100000 c1.bcf c2.bcf ... cN.bcf

delly cnv -u -v sites.bcf -g hg38.fa -m hg38.map -o geno1.bcf input.bam

bcftools merge -m id -O b -o merged.bcf geno1.bcf ... genoN.bcf

delly classify -f germline -o filtered.bcf merged.bcf

bcftools query -f "%ID[\t%RDCN]\n" filtered.bcf > plot.tsv

Rscript R/cnv.R plot.tsv

Somatic copy-number alterations (SCNAs)

delly cnv -u -z 10000 -o tumor.bcf -c tumor.cov.gz -g hg38.fa -m hg38.map tumor.bam

delly cnv -u -v tumor.bcf -o control.bcf -g hg38.fa -m hg38.map control.bam

bcftools merge -m id -O b -o tumor_control.bcf tumor.bcf control.bcf

delly classify -p -f somatic -o somatic.bcf -s samples.tsv tumor_control.bcf

bcftools query -s tumor -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" somatic.bcf > segmentation.bed

Rscript R/rd.R tumor.cov.gz segmentation.bed

FAQ

Citation

Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel.
DELLY: structural variant discovery by integrated paired-end and split-read analysis.
Bioinformatics. 2012 Sep 15;28(18):i333-i339.
https://doi.org/10.1093/bioinformatics/bts378

License

Delly is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.

Credits

HTSlib is heavily used for all genomic alignment and variant processing. Boost for various data structures and algorithms and Edlib for pairwise alignments using edit distance.