ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Samtools module is outdated #122

Open ewels opened 5 years ago

ewels commented 5 years ago

From an email:

When I try run the samtools_sort_index module on an unsorted BAM file it generates a _srtd and _srtd.bai files. Although these files contain the correct information, the extensions can be a bit confusing. So I have decided to change one line on the Perl module script so that the output files can have .bam and .bam.bai extensions.

I wanted to know why you specifically used the _srtd extension and is the change that I made good or bad?

My response:

I think that the intention is for the module to create *_srtd.bam and *_srtd.bam.bai filenames already. If I remember correctly, samtools would automatically add .bam or .sam to the output filenames (these lines check for this). If this isn't happening for you, I guess it's because the default behaviour of samtools has changed in more recent updates.

Generally speaking, it looks to me like the samtools command should be updated. In more recent pipelines I usually use -@ to specify cpus and -o to specify output filename. Notably, doing the latter removes the need to pipe results through samtools again for conversion to bam as done here.

Basically, yes - looks like the module should be updated! I'll see if I can whip a quick pull-request together.

Thanks for letting me know!