Randomly assigning multiply mappable reads to one of mappable sites. And subsequent quantification

FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states

http://felixkrueger.github.io/Bismark/

GNU General Public License v3.0

367 stars 101 forks source link

Randomly assigning multiply mappable reads to one of mappable sites. And subsequent quantification #344

Closed phytomind closed 4 years ago

phytomind commented 4 years ago

Dear Felix,

I recently began analyzing bisulfite sequencing data without much background of experience. I'm focusing on analyzing DNA methylation of repetitive elements in human genome like transposable elements and tandem satellite repeats. And my question is... Is there a way that I can assign multiply mappable BS-seq reads to one of their mappable sites in a random manner, followed by quantitative analysis for repetitive sequence? In RNA-seq analysis, it was easy to do it using Homer software resulting in a reasonable quantification of repetitive sequence transcripts. But bismark seems to exclude such multiple-mapped reads or I don't find how to analyze them. Also it will be greatly appreciated if you can give me any advice about downstream quantitative analysis for those repetitive sequence-matched reads. Currently I'm using MethylKit.

Thank you so much!

Jinkil

FelixKrueger commented 4 years ago

Hi Jinkil,

You are correct in assuming that multi-mapping reads are removed during standard Bismark runs, and there currently is no capacity to allow it map reads to a position at random.

What we have done in the past to get round this issue with repetitive elements is to make special short repeat-genomes by using just a consensus sequence for a certain repeat class or family, such as ERV, L1, SINE, Satellites, LTR etc. The consensus sequences were often downloaded from Repase. From a technical point of view you would just take the .fa sequences of these repeat consensus (just make sure they are not self-repetitive), run bismark_genome_preparation on them, and then use this new genome for the alignments.

We personally use SeqMonk for the downstream analysis, you can simply use the same .fa sequences to make custom chromosomes (corresponding to one repeat consensus each), and then use the usual arsenal of SeqMonk to do the analysis (see here for more).

phytomind commented 4 years ago

Thank you so much for great answers, Felix! I wonder how you dealt with sequence deviation of repetitive elements if you were careful about it.

FelixKrueger commented 4 years ago

That really depends on what the researcher wants to do with it to be honest. Often it is just interesting to see what the methylation of certain classes of repeats is like, and if that changes between conditions. Sometimes people are more interested in specific subfamilies of repeats, then we would dig a little deeper and find something suitable for that. But we are overall not doing a lot of repeat methylation analysis, not sure whether you could get some further ideas from the literature...

phytomind commented 4 years ago

Thanks a lot for your comments! :)