dellytools / delly

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis
BSD 3-Clause "New" or "Revised" License
422 stars 136 forks source link

Output BAM with tags to associate reads back to SV calls? #273

Closed nh13 closed 2 years ago

nh13 commented 2 years ago

Is there a way (or would be difficult) to output a BAM with the reads that contributed to the SV calls, with a SAM tag that identifies which call? This is super useful for debugging calls in IGV. For example, see the GATK HaplotypeCaller assembly BAM output.

tobiasrausch commented 2 years ago

I like the idea but delly accepts multiple BAM files as input which makes it a bit tricky. There is already a delly option to output all SV supporting reads for each input BAM file (-d sv_support.gz) which shows the SV identifier, the input BAM file name, the read name and so on. From that file you could simply extract the read names for each BAM and then use samtools view -b -N file_with_read_names input.bam > sv.bam to get a BAM with the reads that contributed to each SV.

nh13 commented 2 years ago

@tobiasrausch this is exactly what I am looking for (sv_support.gz). Is there a manual for the full usage, or should one just inspect the command line. Thank-you for taking the time to answer!

nh13 commented 2 years ago

@tobiasrausch I was looking at sv_support.gz and found it really useful. But for one of the high quality inversions in the VCF, I didn't see any entries (reads) in the support file. The inversion is not filtered (i.e. is PASS). Any thoughts on how to see the reads for this event? Do we expect to not have all the evidence?

tobiasrausch commented 2 years ago

The sv_support.gz file lists the genotyping support. Delly runs in 2 phases, (1) SV discovery (BAM is taken as is) and (2) SV genotyping where reads are realigned to the REF and ALT allele. Therefore, it can happen that SVs present in the BAM file have zero support in genotyping. Assuming you have only one BAM file then for delly call -d sv_support.gz -o sv.bcf ... the counts for FORMAT:RV (split-read support) and FORMAT:DV (paired-end support) should match to sv_support.gz, i.e.: bcftools query -f "%ID\t[%DV\t%RV]\n" sv.bcf should match zcat sv_support.gz | cut -f 1,9 | sort | uniq -c

In other words, if your above inversion failed in genotyping (GT=0/0, DV=0 and SR=0) this should be okay. If that's not the case then it could be a bug.

tobiasrausch commented 2 years ago

I hope this is solved. Please open a new issue if there are still problems.

mmnikolic commented 11 months ago

Hi @tobiasrausch ! I have a question about "(2) SV genotyping where reads are realigned to the REF and ALT allele.".

Can this change the break-point position? Or is the break-point position already established in the first step (SV discovery (BAM is taken as is)) where no realignment is applied?

I am curious if the realignment could cause discrepancy in visualising the reads from the BAM and the break points.