Getting splitters BAM from long reads data?

GregoryFaust / samblaster

samblaster: a tool to mark duplicates and extract discordant and split reads from sam files.

MIT License

225 stars 30 forks source link

Getting splitters BAM from long reads data? #42

Closed cmdcolin closed 5 years ago

cmdcolin commented 5 years ago

Hi there, I was curious if this tool works for getting splitters for a long read BAM file. I am currently running the steps to try it out but I was wondering if it's something I could recommend to other people (I have a visualization tool that would be ideal if it just got a BAM file with the splitters with everything else filtered out)

GregoryFaust commented 5 years ago

Yes, samblaster will work to output splitters from single-end reads if you use the --ignoreUnmated option. You may also want to read #37

cmdcolin commented 5 years ago

Super thank you! I figured it'd do the trick

cmdcolin commented 5 years ago

If you get a chance maybe add a note in the readme :+1: I'll close for now

GregoryFaust commented 4 years ago

Release 0.1.25 includes sample scenarios in both the README.md and in the program help text.

cmdcolin commented 4 years ago

This is a somewhat weird postmortem, but I found after asking this question that my BAM parser I made wasn't parsing the SA tag and I so I was operating on an assumptionthat there were split reads that lacked SA tag. Since my parser was bad though, it seems generally there will be an SA tag. Would it be fair to say that I could probably rely on the SA tag in most cases and then I could filter splitters from a coordinate sorted BAM by just grepping for the SA tag?

GregoryFaust commented 4 years ago

In our experience, you rarely want to look at all chimeric alignments. That is why samblaster has no fewer than 4 parameters that control which split reads that are output in the splitter file: --maxSplitCount, --maxUnmappedBases, --minIndelSize, and --minNonOverlap. These parameters and their default values were carefully selected to report likely split reads relevant for use in detecting structural variants without a lot of false positives or false negatives. We developed these ideas in Ira Hall's Lab at UVA (now at Wash. U. St. Louis) while doing research that led to several tools/pipelines for SV detection such as SpeedSeq, Lumpy, Hydra, YAHA, SVsim and others.

cmdcolin commented 4 years ago

Thank you for the detailed response. This is quite helpful. My angle is that I am developing tools to help visualize split/paired reads for structural variation, and I will definitely look into these tools as sources of the data (already have used lumpy)