Closed erinyoung closed 1 year ago
It may be a good idea to compress the SAM output from bwa
and minimap2
using gzip
. For example,
# for short reads
bwa mem -t !{task.cpus} !{reference_genome} !{reads} 2>> $err_file | gzip -4 bwa/!{sample}.sam.gz
# or, for long reads
minimap2 !{params.minimap2_options} -ax sr -t !{task.cpus} !{reference_genome} !{reads} 2>> $err_file | gzip -4 > aligned/!{sample}.sam.gz
As a comparison, the Mad River workflow uses BBMap, and I've noticed that its default gzip compression setting is 4 (hence why I used gzip -4
in the commands above). Running gzip -l
on some freshly mapped hMPXV .sam.gz
files gives a space savings of just under 75%.
All the sam files are converted to bam files in the next process, so I'm not sure there's utility in keeping both the bam and sam files.
Ah, I wasn't exactly aware of that. (No wonder Cecret's output directories take up so much space!) Mad River doesn't publish any of the raw SAM output. Even so, compressing the raw SAM output will, at the very least, save scratch space.
They're very large (especially for MPX)