Closed rauldiul closed 4 years ago
Hi @rauldiul
The answer to your questions is actually fairly simple: the seemingly duplicated alignment stats that appear on screen are the STDERR output of the two alignment threads that are being run by Bismark in parallel. Let's assume that
250000 reads; of these:
250000 (100.00%) were paired; of these:
135565 (54.23%) aligned concordantly 0 times
76841 (30.74%) aligned concordantly exactly 1 time
37594 (15.04%) aligned concordantly >1 times
45.77% overall alignment rate
is the OT (original top) strand, and
250000 reads; of these:
250000 (100.00%) were paired; of these:
135540 (54.22%) aligned concordantly 0 times
77293 (30.92%) aligned concordantly exactly 1 time
37167 (14.87%) aligned concordantly >1 times
45.78% overall alignment rate
Processed 250000 sequences in total
is the OB (original bottom) strand (it could be the other way round but that doesn't matter here). As you can see, both of these alignments produce ~31% uniquely mapping alignments, and ~15% potentially multi-mapping alignments. Discordant reads do not even make it into this report, as they are discarded straight away.
What Bismark then does internally is to figure out whether sequences can be assigned uniquely to either the top or bottom strands. Only then are reads reported in the Bismark output and count toward the Bismark report. So 62% (2* 31%) of unique top or bottom strand alignments, and a fraction of the multimapping reads that have a best alignment to either the top or bottom strand, together result in an overall Bismark mapping efficiency of ~69%. Does that make sense?
69% mapping is not bad, but whether or not this is already the maximum you can achieve depends on several factors. Among them,
We have tried to list a few more details here: https://github.com/FelixKrueger/Bismark/blob/master/Docs/FAQ.md#issue-2-low-mapping-effiency-of-paired-end-bisulfite-seq-sample
Let me know you still have any questions.
Dear Felix,
Thank you for the quick response. Indeed it was a quite obvious issue... and you have cleared my doubts.
Yes, I'm OK with the 69 % efficiency, these are 150bp PE reads, R1, R2 similar and good quality, and trimmed, but they are mouse genome. So I'll play with --score-min and such, but the efficiency seems to be normal.
thanks for your help
Excellent, glad I could help.
Dear Felix,
I have a question regarding the output that Bismark returns in the terminal, which gives details on the numbers of concordant/discordant pairs, and which does not appear in the outputted report file.
I'm running Bismark for paired-end files with the following code:
bismark --score_min L,0,-0.4 /home/rtejedor/Documents/Learning/oxWGBS/oxWGBS_genome -1 JC-BS_R1_first_250k.fq -2 JC-BS_R2_first_250k.fq
The output displays the default Bowtie 2 options selected, including
--no-discordant
:Summary of all aligner options: -q --score-min L,0,-0.4 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500
In the terminal output, Bismark shows the stats for Alignment rate and concordant/discordant mates (this is the part that is not included in the final report):
And it finally shows the Mapping efficiency:
My understanding is that the
--no-discordant
argument will make Bowtie 2 "not look" for alignment for the reads belonging to discordant mates. Thus, looking at the output, what I do not understand is,Why does it give me 2 different stats in the alignment rate section? That part of the output seems duplicated
If I have around 45 % concordant mates, and the discordant are being discarded and not mapped, what do the
Mapping efficiency
numbers relate to? Because the Alignment Report says that 172537 reads aligned, but that's more reads than my number of concordant mates (around 135000)Is a 45 % concordant % unusually low in these types of experiments? (increasing the
-X
parameter did not improve the results)I'm sure I'm missing something obvious here, and maybe this is a Bowtie 2 question,
but thanks for your help