Closed bibbers93 closed 4 years ago
Using default parameters, samblaster treats unmated reads as a fatal error and stops processing the input at that location in the input SAM file. This is because this error is usually caused by a mis-sorted input file as in your first run. Once you properly name-sorted the input file, the sort issue is resolved. However, because you have selected reads mapped to a single chromosome, your input file will undoubtedly still have unmated reads due to read pairs in which the two reads map to different chromosomes. That is why you ended up with a partial output file, as the input was processed until samblaster reached the first such read.
When you are REALLY sure that your input is properly sorted, you can use the --ignoreUnmated option to allow samblaster to continue to process the input past such unmated reads. If you use this option, samblaster will output to stderr all of the read-ids that were unmated, and give a count of unmated reads with the rest of the statistics at the end of the run. Therefore, you might want to consider piping or redirecting stderr to a file to capture this output and avoid a huge volume of output to the screen.
N.B. samblaster no longer outputs the ids of unmated reads to stderr when using the --ignoreUnmated option, but still outputs the count of the number of unmated primary reads found.
Release 0.1.25 adds better explanations and usage scenarios for the --ignoreUnmated option in both the README.md and the program help.
Hi, I've recently downloaded SAMBLASTER for retrieving softclipped, split, and unmapped reads from whole genome sequencing data. At the moment I'm practicing on chromosome 17 of NA12878 (available from GIAB).
I've extracted chr17, generated a bam file and indexed this all using samtools. I ran
In this first instance it runs, but I then get the following messgaes to my screen
In checking, I don't think my original input.bam was sorted by QNAME, so I've done the following
In re-running this, I get the same error messages as before BUT there are output .sam files in my current directory. Has Samblaster quit with only a part of the data in these -d -u -s files, or is that the final output? and how can I overcome these errors about my paired reads and sorting by read ID
Thanks!! :)