Open sehawk opened 1 year ago
Sorry, I should have responded to this a long time ago.
samblaster outputs the message to double check the input file sort order whenever it finds a read flagged as part of a pair, but the mate does not appear next to it in the input SAM file. samblaster does not use the SO information in the header of a SAM file because that is not a reliable way to determine if the file is sorted in a particular order or not. All it means is that some tool within your pipeline sorted it that way, but not that it is necessarily still in that sort order. For example, not all the available SAM/BAM sorting tools actually update that field. In addition, some SAM file manipulations essentially require that one process the header and the rest of the file separately leading to the header not being fully up to date. Perhaps most importantly, when samblaster is used in its "natural" place in a pipeline, processing a SAM file piped directly from the output of the aligner, the SAM file will be read-id grouped but not have been sorted in any order and won't have any SO field in the header (the SAM file will just be in whatever order the input fastq files are in). It is far more dependable, especially when received an error that may indicate a possible sorting issue, to look at the first chunk for alignments in the file to see if they are in fact sorted the way you expect.. In this case, I believe the file is properly sorted due to the small percentage of reads without mates (2.6%), but perhaps some unaligned reads have been removed??
Oops, thanks for pointing this out. Currently, samblaster is only counting the number of duplicates when it is marking them itself. As you used the -a option, the duplicate marking algorithm has been bypassed. But even in this case, samblaster should still count the number of identified duplicates in the file so that accurate statistics can be output. (Note to self: move lines 1116-1118 down about 8 lines and predicate them on the flag state, not on whether or not the pair is added to the hash table.)
Hi @GregoryFaust
We are executing
samblaster 0.1.26
using below commandsamblaster log
Flagstats of input sam
Flagstats of output sam post executing samblaster
Questions
samblaster: Please double check that input file is read-id (QNAME) grouped.
. Our input sam is already QNAME sorted. We verified that by checking SAM headerSO
(sort order)samtools flagstat