GregoryFaust / samblaster

samblaster: a tool to mark duplicates and extract discordant and split reads from sam files.
MIT License
224 stars 30 forks source link

discordant read pairs and split reads #31

Closed WortJohn closed 7 years ago

WortJohn commented 7 years ago

I want to know whether the output file of "--discordantFile" option includes the output file of "--splitterFile" option or not ?

GregoryFaust commented 7 years ago

No, these are two independent options which output reads based on very different criteria. --discordantFile will output pairs of reads that map to the reference genome in other then the expected orientation and/or insert size. --splitterFile will output those reads for which a portion of the read aligns to one region of the reference genome, while the remainder of the read aligns elsewhere in the reference genome. In general, there will be little overlap in the reads output in the two files, as being a discordant read pair is, by definition, a property of the alignment of both reads in a pair, while a split read is a property of a single read, regardless of how the its mate is aligned. That being said, It is possible that a split read will also be part of a discordant read pair. In that case, that particular read would be output in both of the files.

The two different output files therefore provide different signals to Structural Variant detection tools such as LUMPY. If you are using such a tool, I strongly encourage you to use both output files to ensure best SV detection results.