FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
394 stars 103 forks source link

Visualizing the CpG, CHG and CHH in Seqmonk #704

Open lahyusof opened 1 month ago

lahyusof commented 1 month ago

Hi Felix,

I have two questions.

1) I managed to get my COV files uploaded onto Seqmonk and got to visualizing the probes using 'Bisulphite Methylation Over' pipeline but I haven't been able to figure out how to visualize the CG, CHG and CHH methylation levels yet. Really new to this platform and I've been trying to fiddle around with everything but I haven't cracked it. I have the three sequence context files in .dat.txt format.

2) I have 5 different rice samples, three replicates each I’ve merged the three replicates into one representative track for each rice I’m analyzing. I’ve tried running ‘Filter by Statistical Test’ for replicate data (i.e. t-test/ANOVA and Logistic Regression) and am not able to run the test at p<0.05 or even 0.1. That leaves me with analyzing unreplicated data and I’m currently unsure of what I should do now and would like to ask for recommendations. Is it better to filter statistics based on continuous data or proportions? Chi-square, Windowed Replicates? Are there other things I should consider during analysis? I've copied this second question to Simon Andrews as well and am currently waiting for his reply.

I would really appreciate a reply for guidance. I've tried watching numerous Youtube videos and online pdf lectures but I’m so new to SeqMonk and methylation analysis in general that I don’t even know what to start analyzing, what parameters to set (eg window size), when and what data to construct my graphs on and just generally, what is the best way to go about this.

Thank you

FelixKrueger commented 1 month ago

Regarding 1.:

The methylation analysis course hosted at the Babraham Bioinformatics website should give you a very good idea about this, in particular the practical Visulaising and Exploring Methylation data is exactly what you will need. It deals with how much data you might want to aggregate to give you a meaningful measure, and how to do it. Just give it a go!

A word or warning: CHG and especially CHH methylation contains vastly more positions than CpG-context, so you will fairly soon reach territory where you will encounter long times for calculations, and/or your machine's RAM may be a limiting factor.

Re 2): As you can see for the differtial Methylation part of the practical, we used to run either an un-replicated Chi-Square filter on the data, as well as two methods that take replicates into account (EdgeR or logistic regression). I believe one of the later exercises is combining the fitetred hits to see how well they correlate/overlap. Again, it is really well documented in the practical instructions.

Good luck!