FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
53 stars 20 forks source link

About singletons #81

Open ddepierre opened 5 months ago

ddepierre commented 5 months ago

Hi Felix,

It's unclear for me how singletons are handled by SNPsplit. So far I understand it, singletons are cases when only 1 read of the pair is properly aligned in paired-end seq data. I used SNPsplit with the --singleton option, and when I load the sample.genome1_st.bam on IGV, i can see properly paired reads: image With some SAM flags corresponding to "read paired (0x1)" and "read mapped in proper pair (0x2)"

And no singletons are reported when I use samtools flagstats:

$ samtools flagstats sample.genome1_st.bam
16803110 + 0 in total (QC-passed reads + QC-failed reads)
16803110 + 0 primary
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
16803110 + 0 mapped (100.00% : N/A)
16803110 + 0 primary mapped (100.00% : N/A)
16803110 + 0 paired in sequencing
8421560 + 0 read1
8381550 + 0 read2
16803110 + 0 properly paired (100.00% : N/A)
16803110 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Alignment is done by with bowtie2 bowtie2 -t -q -N 1 -L 25 -X 1000 --no-discordant --no-mixed | samtools view -h -F 1804 -q 15 | samtools sort And excludes all the reads not mapped as paired.

Does SPNsplit get singletons using SAM flags or using other flags? Why most of my reads are considered singletons by SNPsplit?

Thanks, David

FelixKrueger commented 5 months ago

Hi David,

By definition, if you run Bowtie2 with --no-discordant --no-mixed, you should not have any singletons in the file (that is what these 2 options do). Thus, you should not need to use --singletons when running SNPsplit, not should it return anything for the the singleton file.

My suspicion is that you are piping the result to samtools sort, which I assume will sort the reads by position. Hower, SNPsplit requires the reads to be sorted by name, as it determines whether reads are a proper pair by looking at two consecutive lines at a time. If the read ID matches up between the 2 lines it has to be a PE alignment, else it is considered a singleton. Sorting by position may sort reads 1 and read 2 away from each other (if there are other alignments in the area), and thus ends of pairs may end up in the incorrect file.

What you need to do now is: a) don't use --singleton b) don't sort alignment file by position (Bowtie2 will put our read pairs in other) before running SNPsplit c) If you don't want to re-run the alignment step, use samtools sort -n on the position sorted file. Or alternaitvely, SNPsplit will do this name sorting for you if you do not specify --no_sort

I hope this helps?