GregoryFaust / samblaster

samblaster: a tool to mark duplicates and extract discordant and split reads from sam files.
MIT License
225 stars 30 forks source link

workaround for bam files with some singleton reads #15

Closed elzbth closed 9 years ago

elzbth commented 9 years ago

Hi!

I am so glad samblaster exists! It makes sense of the chaos of bwa-mem, which has been the bane of my existence recently ... but oh-so-useful for SV calling. One issue I have encountered, however, is when I have re-mapped reads extracted from a mapped bam in which some filtering has happened and there are no longer all the pairs present. So, when I sort it by name for samblaster, there are some reads with no pairs and samblaster complains that the file might not be sorted by name. If I may suggest adding an option that would allow to ignore that check and skip mateless reads, that would be awesome!!

Cheers,

Elizabeth

elzbth commented 9 years ago

oh yes and when samblaster does exit because of mismatched pairs, it would be great if it returned something else than 0 to be able to catch that error -- since it writes the the outfiles until it exits, turns out checking the existence and non-emptiness of the outfiles isn't sufficient to make sure samblaster completed OK :)

Cheers,

Elizabeth

kmhernan commented 9 years ago

Yes this is so frustrating that it wont handle cases where a mate has been filtered out! I am currently having the same problem and don't know of an easy way to filter out the mateless reads without using really slow methods (especially when my bam files are > 100GB).

elzbth commented 9 years ago

Hi,

I have been using instead the methods described in the LUMPY (https://github.com/arq5x/lumpy-sv) workflow to separate split and discordant reads. Extracting discordant reads can be done with samtools, and split reads with a script they supply:

from their docs: _Extract the discordant paired-end alignments. samtools view -b -F 1294 sample.bam > sample.discordants.unsorted.bam

_Extract the split-read alignments samtools view -h sample.bam \ | scripts/extractSplitReads_BwaMem -i stdin \ | samtools view -Sb - \

sample.splitters.unsorted.bam

Hope that helps!

Cheers,

Elizabeth

kmhernan commented 9 years ago

Thanks @elzbth I appreciate it. I did to this, but I wanted to also run SVtyper which throws exceptions for, what I'm assuming is, similar reasons.

elzbth commented 9 years ago

True, it only circumvents the problem ...

On Fri, Jun 5, 2015 at 5:14 PM, Kyle Hernandez notifications@github.com wrote:

Thanks @elzbth https://github.com/elzbth I appreciate it. I did to this, but I wanted to also run SVtyper which throws exceptions for, what I'm assuming is, similar reasons.

— Reply to this email directly or view it on GitHub https://github.com/GregoryFaust/samblaster/issues/15#issuecomment-109442886 .

GregoryFaust commented 9 years ago

Fixed in release 0.1.22 with the --ignoreUnmated option.