Open ddepierre opened 5 months ago
Hi David,
By definition, if you run Bowtie2 with --no-discordant --no-mixed
, you should not have any singletons in the file (that is what these 2 options do). Thus, you should not need to use --singletons
when running SNPsplit, not should it return anything for the the singleton file.
My suspicion is that you are piping the result to samtools sort
, which I assume will sort the reads by position. Hower, SNPsplit requires the reads to be sorted by name, as it determines whether reads are a proper pair by looking at two consecutive lines at a time. If the read ID matches up between the 2 lines it has to be a PE alignment, else it is considered a singleton. Sorting by position may sort reads 1 and read 2 away from each other (if there are other alignments in the area), and thus ends of pairs may end up in the incorrect file.
What you need to do now is:
a) don't use --singleton
b) don't sort alignment file by position (Bowtie2 will put our read pairs in other) before running SNPsplit
c) If you don't want to re-run the alignment step, use samtools sort -n
on the position sorted file. Or alternaitvely, SNPsplit will do this name sorting for you if you do not specify --no_sort
I hope this helps?
Hi Felix,
It's unclear for me how singletons are handled by SNPsplit. So far I understand it, singletons are cases when only 1 read of the pair is properly aligned in paired-end seq data. I used SNPsplit with the --singleton option, and when I load the sample.genome1_st.bam on IGV, i can see properly paired reads: With some SAM flags corresponding to "read paired (0x1)" and "read mapped in proper pair (0x2)"
And no singletons are reported when I use samtools flagstats:
Alignment is done by with bowtie2
bowtie2 -t -q -N 1 -L 25 -X 1000 --no-discordant --no-mixed | samtools view -h -F 1804 -q 15 | samtools sort
And excludes all the reads not mapped as paired.Does SPNsplit get singletons using SAM flags or using other flags? Why most of my reads are considered singletons by SNPsplit?
Thanks, David