FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
390 stars 101 forks source link

Bismark alignment only to top strand #286

Closed Devall01 closed 4 years ago

Devall01 commented 4 years ago

Dear Felix,

I have come across an issue when aligning amplicon based BS-seq libraries. The majority of the reads are aligning to only one strand: the complementary to the top strand:

Final Alignment report

Sequence pairs analysed in total: 31114 Number of paired-end alignments with a unique best hit: 17891 Mapping efficiency: 57.5% Sequence pairs with no alignments under any condition: 13222 Sequence pairs did not map uniquely: 1 Sequence pairs which were discarded because genomic sequence could not be extracted: 0

Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 4100 ((converted) top strand) GA/CT/CT: 13789 (complementary to (converted) top strand) GA/CT/GA: 1 (complementary to (converted) bottom strand) CT/GA/GA: 1 ((converted) bottom strand)

I have previously specified directional libraries, only to see a large drop in mapping efficiency as expected given the distribution of alignment above.

I have also aligned both pairs separately:

R1:

Final Alignment report

Sequences analysed in total: 31114 Number of alignments with a unique best hit from the different alignments: 22773 Mapping efficiency: 73.2% Sequences with no alignments under any condition: 8296 Sequences did not map uniquely: 45 Sequences which were discarded because genomic sequence could not be extracted: 0

Number of sequences with unique best (first) alignment came from the bowtie output: CT/CT: 4253 ((converted) top strand) CT/GA: 10 ((converted) bottom strand) GA/CT: 18502 (complementary to (converted) top strand) GA/GA: 8 (complementary to (converted) bottom strand)

R2:

Final Alignment report

Sequences analysed in total: 31114 Number of alignments with a unique best hit from the different alignments: 14243 Mapping efficiency: 45.8% Sequences with no alignments under any condition: 16844 Sequences did not map uniquely: 27 Sequences which were discarded because genomic sequence could not be extracted: 0

Number of sequences with unique best (first) alignment came from the bowtie output: CT/CT: 14237 ((converted) top strand) CT/GA: 6 ((converted) bottom strand) GA/CT: 0 (complementary to (converted) top strand) GA/GA: 0 (complementary to (converted) bottom strand)

I am relatively new to BS-Seq, but I was expecting either a roughly equal distribution among OT/OB if directional or across all 4 if not. Am I mistaken?

Many thanks,

Matthew

FelixKrueger commented 4 years ago

Hi Matthew,

This is quite typical phenomenon for amplicon-based libraries. These amplicons are usually generated by taking the reference DNA sequence, which is normally shown as the top strand only, performing an in-silico bisulfite digest of the sequence, and finally designing primers to this sequence. The (real) bisulfite treatment of DNA renders the top and bottom strands of DNA no longer complementary to each other, your amplification primers should amplify exclusively the top strand sequence. Primers for the bottom strand sequence would have to be designed specifically in a second step.

So in that sense your experiment looks pretty 'normal' I would say, since both the OT and CTOT strands give you information for the top strand only. You could maybe try to identify why you are seeing a much better alignment efficiency in paired-end mode compared to SE mode, e.g. look at trimming and/or alignment parameters.

FelixKrueger commented 4 years ago

I hope this helped at the time