biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
565 stars 105 forks source link

add feature "markdup support multiple input bams and argument file" #341

Closed lituan closed 4 years ago

lituan commented 6 years ago

Hi,

smabamba is really fast doing markdup, yet it cannot support multiple input bams, which picard can.

we usually split multiple fq into small chunks, then align, then merge, then markdup.

with picard, we can do split, align, markdup, excluding merge step

so, if sambamba can support multiple input bams in markdup, it will help a lot

pjotrp commented 6 years ago

Good idea.

bwlang commented 6 years ago

would also be really nice to include an option to pass a file of filenames when paths are long or there are many bams.
Possibly related... adding a --sample #number_of_reads option to markdup would allow skipping another file (it would need to apply the sampling logic before dup logic to avoid reads marked dup in the output but the original read being filtered out)