amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

-F option yielding incongruous output #119

Open cxr5298 opened 5 years ago

cxr5298 commented 5 years ago

While using SNAP with an Illumina ampliseq dataset I get kind of surprising output when using SNAP's -F filter function.

As it is I run the data I'm splitting it out into two buckets one for aligned reads and unaligned reads, now conventional wisdom would suggest that the read totals across both the aligned and unaligned sets would add up to the read count of the run as a whole.

However it doesn't, it adds up as being greater than the original read count. Moreover there is overlap between the unaligned and aligned reads when I dive into the individual .sam files. I also looked at the supposed unaligned reads and they in fact align to my original reference, its just SNAP that's giving them mapq scores of 0. For the life of me I cannot tell why this is happening. Unfortunately I am not able to share any data so I understand if any input you can offer is limited. I will share what I can below:

The call to SNAP: ~/project/snap-aligner paired ~/data/indx/ ~/data/sample1_R1.fastq ~/data/sample1_R2.fastq -F a -o sample1_aln.sam ~/project/snap-aligner paired ~/data/indx/ ~/data/sample1_R1.fastq ~/data/sample1_R2.fastq -F u -o sample1_unaln.sam

The indexed reference I'm using is a small database of only 48 sequences ranging from 95 to 225 bases in length.