arq5x / lumpy-sv

lumpy: a general probabilistic framework for structural variant discovery
MIT License
314 stars 118 forks source link

Proper discordant read extraction - samblaster vs. samtools #193

Closed tantrev closed 7 years ago

tantrev commented 7 years ago

I've noticed with a whole-genome bisulfite sequencing dataset (2x100bp, aligned using bwa-meth), that there's a huge difference between the samblaster discordant read extraction and the samtools read extraction. That is, the following samtools command (as suggested in the lumpy documentation) results in many discordant reads extracted from my bam files, where samblaster results in none:

# Extract the discordant paired-end alignments.
samtools view -b -F 1294 sample.bam > sample.discordants.unsorted.bam

Any advice if I should be trusting one of these tools over the other?

ryanlayer commented 7 years ago

What is your samblaster command? And what are the sizes those results (split and discordant)?

On May 24, 2017, at 12:35 PM, Trevor Tanner notifications@github.com wrote:

I've noticed with a whole-genome bisulfite sequencing dataset (2x100bp, aligned using bwa-meth), that there's a huge difference between the samblaster discordant read extraction and the samtools read extraction. That is, the following samtools command (as suggested in the lumpy documentation) results in many discordant reads extracted from my bam files, where samblaster results in none:

Extract the discordant paired-end alignments.

samtools view -b -F 1294 sample.bam > sample.discordants.unsorted.bam Any advice if I should be trusting one of these tools over the other?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

tantrev commented 7 years ago

Here's the generic samblaster command (I've tested it with and without the "-M" flag, also please forgive the unnecessary -t fai_index flags - they're a relic from previous efforts to track down an error):

samtools view -h ${inputFile} | samblaster --addMateTags -M --maxSplitCount 9999999999 --maxUnmappedBases 9999999999 --minIndelSize 1 --minNonOverlap 1 --minClipSize 1 -e -d >(samtools view -t ${fai_index} -Sb - > ${baseName}/${baseName}.disc.bam) -s >(samtools view -t ${fai_index} -Sb - > ${baseName}/${baseName}.split.bam) -u >(gzip > ${baseName}/${baseName}.unmapped.fastq.gz) -o >(samtools view -t ${fai_index} -Sb - > ${baseName}/${baseName}.dedupped.bam)"

And some example output with filesizes:

7.1M May 24 00:26 m1.split.bam
1.4K May 24 00:26 m1.disc.bam
27G May 24 00:26 m1.dedupped.bam
3.1G May 24 00:26 m1.unmapped.fastq.gz

Here are the commands used for samtools:

samtools view -b -F 1294 bams/m1.bam > sample.discordants.unsorted.bam
samtools view -h bams/m1.bam | tools/lumpy-sv/scripts/extractSplitReads_BwaMem -i stdin | samtools view -Sb - > m1.splitters.bam

The splitter command is still running (I accidentally deleted its previous output), but its current file size and the completed discordant output is:

20M May 24 14:21 m1.splitters.bam
15G May 24 14:29 sample.discordants.unsorted.bam

Not sure if it makes any difference, but this is all mouse data.

Update: The proper filesizes for both of the completed outputs are in fact:

28M May 24 22:02 m1.splitters.bam
23G May 25 00:43 sample.discordants.unsorted.bam
tantrev commented 7 years ago

As kindly outlined by @brentp in this bwa-meth ticket, discordant reads aligned by bwa-meth aren't amenable for filtering by the discordant flag.

@ryanlayer - do you by chance have any guidance how to alternatively filter for discordant reads, perhaps by using lumpy_filter?

ryanlayer commented 7 years ago

lumpy_filter also uses the discordant flag. You could fairly easily modify any of the existing methods to test the distance between the mapped ends and the orientation of the ends. For example, in lumpy_filter the line

if (((aln->core.flag) & 1294) == 0) r = sam_write1(disc, hdr, aln);

just checks the flag. If you expanded that if statement to test length (e.g., > mean length + 5*length stdev) and strand (e.g., not +/-) then it should work for you.

On Sun, Jun 4, 2017 at 11:25 PM, Trevor Tanner notifications@github.com wrote:

As kindly outlined by @brentp https://github.com/brentp in this bwa-meth ticket https://github.com/brentp/bwa-meth/issues/43, discordant reads aligned by bwa-meth aren't amenable for filtering by the discordant flag.

@ryanlayer https://github.com/ryanlayer - do you by chance have any guidance how to alternatively filter for discordant reads, perhaps by using lumpy_filter?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/193#issuecomment-306105975, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUX2VpnMY6zd1cfO9onj4PPVIoWeXks5sA5E7gaJpZM4NlgRe .

-- Ryan Layer