Sambamba has a number of functions which are reported to be quicker than counterparts.
I'm trying to exchange the following functions (view, sort, markdup & index) in our pipeline for faster alternatives.
While most of the functions appear faster from my benchmarks the use of view/sort raises some concern. It appears that the critical difference between Samtools and Sambamba seems to be the first step in the pipeline - as samtools sort both sort's and converts SAM -> BAM (the typical job of view).
I'm wondering if this is not also possible with `Sambamba, as it appears to be bottle-neck.
Old Pipeline (Samtools & Picard)
# Convert SAM to BAM & Sort
./samtools-1.3.1/samtools sort -@ 8 -o proband_bwaMEM_sort.bam proband_bwaMEM.sam
# Markdups
java -Xmx4G -jar picard.jar MarkDuplicates \
VALIDATION_STRINGENCY=LENIENT READ_NAME_REGEX=null \
I=proband_bwaMEM_sort.bam \
O=proband_bwaMEM_sort_dedupped.bam \
M=proband_output.metrics.bwaMEM.txt;
# Samtools index
./samtools-1.3.1/samtools index proband_bwaMEM_sort_dedupped.bam;
Although the overall time for markdup and index is greatly improved I found that with playing around with the number of cores for sambamba view and sambamba sort (8, 16, & 32 cores) that their speed, even at the optimum number of cores, was slower than the samtools function:
Sambamba
has a number of functions which are reported to be quicker than counterparts.I'm trying to exchange the following functions (
view
,sort
,markdup
&index
) in our pipeline for faster alternatives.While most of the functions appear faster from my benchmarks the use of
view
/sort
raises some concern. It appears that the critical difference betweenSamtools
andSambamba
seems to be the first step in the pipeline - assamtools sort
bothsort
's and convertsSAM -> BAM
(the typical job ofview
).I'm wondering if this is not also possible with `Sambamba, as it appears to be bottle-neck.
Old Pipeline (
Samtools
&Picard
)Samamba Implementations
Benchmarks
Although the overall time for
markdup
andindex
is greatly improved I found that with playing around with the number of cores forsambamba view
andsambamba sort
(8, 16, & 32 cores) that their speed, even at the optimum number of cores, was slower than thesamtools
function: