GregoryFaust / samblaster

samblaster: a tool to mark duplicates and extract discordant and split reads from sam files.
MIT License
225 stars 30 forks source link

Write duplication metrics to file #16

Closed dakl closed 4 years ago

dakl commented 9 years ago

Thanks for a great tool!

It would be nice if it was possible to write the output metrics to a separate file like so:

bwa mem -M <idxbase> samp.r1.fq samp.r2.fq | samblaster -M -m metrics.txt | samtools view -Sb - > samp.out.bam

The new part being -m metrics.txt. A structured format would be nice, tsv probably easiest to parse downstream.

Current output to stderr: samblaster: Removed 1229 of 4596 (26.74%) read ids as duplicates using 1324k memory in 0.000S CPU seconds and 0S wall time. . My suggestion would be something like this:

TOTAL_READS   DUPLICATES_REMOVED   MEMORY_USED   CPU_SECONDS   WALL_TIME
4596          1229                 1324k         0.000         0
dakl commented 9 years ago

I've made a patch in #17.

GregoryFaust commented 4 years ago

samblaster version 0.1.25 now outputs many more statistics about the category of reads in the input file and the duplicates in each category, albeit still to stderr.