biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
565 stars 105 forks source link

MultiQC support for sambamba markdup #472

Closed gartician closed 3 years ago

gartician commented 3 years ago

Hi sambamba team,

Thank you for your work in putting together a fast and effective package. I have been running sambamba markdup recently and noticed the output log file could be parsed to fit a multiQC module. I imagine the duplicates, paired end, and single-end reads could be summarized in a bar graph to show proportions and absolute numbers. I would love to write this module, and wonder if my approach is correct. The following is an example of my outputs

image

Would the following calculations be correct?

duplicate_rate = duplicates / (end_pairs x 2 + single_ends - unmatched pairs) x 100 in my example, it would be: duplicate_rate = 38492943 / (52318342 x 2 + 473488 - 1809) duplicate_rate = 36.6%

I have cross-referenced this number by counting BAM entries before and after duplicate removal, and both methods get the same result. So my question really is, would you be interested in a sambamba markdup multiQC module?

pjotrp commented 3 years ago

Hi @gartician, yes for sure. Sambamba is kind of in maintenance mode and I will accept a pull request with tests.

It is also possible to program in Rust. We are creating a new bam reader/writer in Rust. I can point you to the code if that is of interest, but it will probably involve writing more code at this stage.

gartician commented 3 years ago

Hi @pjotrp, the Sambamba markdup module has been written for MultiQC and it just needs final approval from the main devs. I've added 2 columns of Duplicate Reads and Duplicate Rates per sample, and a bar graph showing different types of reads. If all goes well, the markdup module should be included in the next MultiQC release >1.10.

image image

Thank you and the sambamba team for writing markdup!

pjotrp commented 3 years ago

Looks great! Looking forward to a pull request!