biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
558 stars 104 forks source link

requesting clarification / documentation issue? #378

Closed cmeesters closed 4 years ago

cmeesters commented 5 years ago

Dear all,

whilst trying sambamba-0.6.8-linux-static on sorted bam files (sorted with samtools v1.6) to mark duplicates, the command line was

./sambamba-0.6.8-linux-static markdup -p sorted/asdf_sorted.bam dummy/asdf.bam

sambamba reported:

sorted 344617544 end pairs and 305879 single ends

- Should I therefore assume that an independent sorting step is not necessary (I only took a files already sorted, because it was available)? - Is this documented behaviour? If so, where? - sambamba was using several threads, although the call did not specify -t, --nthreads . Is this somewhere documented? If so, where? (Throtteling to an intended number is working.) - are there any benchmarks available? Best regards, Christian Meesters
cmeesters commented 5 years ago

Well, you have the right to resort to g*, I have that not to use this "data hydra".

In this particular case the background, which I did not elaborate, was that I considered supporting your software on our statewide (RLP in Germany) cluster. The prerequisites (amongst others) are that we are able to compile it (which implies including a D-compiler in our llvm setup) and -- considering the fact that most bioinf-software is abandonware -- to see that a software is actively maintained (at least for a while).

pjotrp commented 5 years ago

Hi @cmeesters, if you promise to update the documentation I will help you with any issues. Documentation is a valuable contribution.

This is a free software project that is maintained by people in their free time. Sambamba is a great and robust performer, that is the only reason to use it - no other.

The README points to existing performance metrics, btw. Feel free to help out with updates.