BoutrosLaboratory / bamql

Query language for filtering SAM/BAM reads
http://labs.oicr.on.ca/boutros-lab/software/BAMQL
Other
31 stars 6 forks source link

Run performance tests #2

Closed apmasell closed 9 years ago

apmasell commented 9 years ago

Write some queries and test them compared to samtools view for accuracy and performance. Put them in the documentation as examples.

apmasell commented 9 years ago

For comparison:

time ./barf  -o barf_output.bam -b -f input.bam 'chr(M) | mate_chr(M)'

versus

samtools view -h input.bam | awk '/^@/ { print; header++; } $3 ~ /^(chr)?(M|25)$/ || $7 ~/^(chr)?(M|25)$/ {print; accepted++;} END { print "Accepted: " accepted " Rejected: " (NR- accepted-header) > "/dev/stderr" }' | samtools view -S -b -o foo2.bam -

The number of sequences accepted both say 7041512, the number rejected, both say 3955753362.

Variable BARF SAMtools+AWK
Time 3:31:41 4:04:01
File size 695344044 695250018

The file size different is larger than expected. There's a header addition, but it shouldn't be 91kB.