amplab / snap

Scalable Nucleotide Alignment Program -- a fast and accurate read aligner for high-throughput sequencing data
https://www.microsoft.com/en-us/research/project/snap/
Apache License 2.0
287 stars 66 forks source link

SNAP returning different results? #167

Closed YasirKusay closed 1 year ago

YasirKusay commented 1 year ago

I used SNAP to align my library of 43 million reads against an index of the horse genome to perform host depletion and I get different results. Sometimes I have barely any reads deleted, sometimes I have 23 million remaining reads and sometimes I have 7 million reads deleted. I have verified that I used the same index, what could be a possible cause for this?

I used the default snap-align commands (with the -I flag).

bolosky commented 1 year ago

It's not clear from your description exactly what happened, but SNAP's alignments should be deterministic in the sense that if you run the same read (pair) with the same index (and the same version of SNAP) that you should get the same alignment. The order in which reads come out isn't deterministic, but the alignments should be the same.

When you feed it paired-end reads in separate FASTQ files, the reads have to be in the same order in each file. That is, the first read in the first file should be the mate pair of the first read in the second file. If you don't do that, then the alignments will not be what you want.

YasirKusay commented 1 year ago

Thank you for your reply. Will it be sufficient to simply sort the sam output file by name of the reads? When extracting the read pairs from the sam file with samtools fastq, sorting it beforehand seems to also work.