Closed apmasell closed 9 years ago
For comparison:
time ./barf -o barf_output.bam -b -f input.bam 'chr(M) | mate_chr(M)'
versus
samtools view -h input.bam | awk '/^@/ { print; header++; } $3 ~ /^(chr)?(M|25)$/ || $7 ~/^(chr)?(M|25)$/ {print; accepted++;} END { print "Accepted: " accepted " Rejected: " (NR- accepted-header) > "/dev/stderr" }' | samtools view -S -b -o foo2.bam -
The number of sequences accepted both say 7041512, the number rejected, both say 3955753362.
Variable | BARF | SAMtools+AWK |
---|---|---|
Time | 3:31:41 | 4:04:01 |
File size | 695344044 | 695250018 |
The file size different is larger than expected. There's a header addition, but it shouldn't be 91kB.
Write some queries and test them compared to
samtools view
for accuracy and performance. Put them in the documentation as examples.