Closed gartician closed 3 years ago
Hi @gartician, yes for sure. Sambamba is kind of in maintenance mode and I will accept a pull request with tests.
It is also possible to program in Rust. We are creating a new bam reader/writer in Rust. I can point you to the code if that is of interest, but it will probably involve writing more code at this stage.
Hi @pjotrp, the Sambamba markdup module has been written for MultiQC and it just needs final approval from the main devs. I've added 2 columns of Duplicate Reads
and Duplicate Rates
per sample, and a bar graph showing different types of reads. If all goes well, the markdup module should be included in the next MultiQC release >1.10.
Thank you and the sambamba team for writing markdup!
Looks great! Looking forward to a pull request!
Hi sambamba team,
Thank you for your work in putting together a fast and effective package. I have been running
sambamba markdup
recently and noticed the output log file could be parsed to fit a multiQC module. I imagine the duplicates, paired end, and single-end reads could be summarized in a bar graph to show proportions and absolute numbers. I would love to write this module, and wonder if my approach is correct. The following is an example of my outputsWould the following calculations be correct?
duplicate_rate = duplicates / (end_pairs x 2 + single_ends - unmatched pairs) x 100 in my example, it would be: duplicate_rate = 38492943 / (52318342 x 2 + 473488 - 1809) duplicate_rate = 36.6%
I have cross-referenced this number by counting BAM entries before and after duplicate removal, and both methods get the same result. So my question really is, would you be interested in a
sambamba markdup
multiQC module?