FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
394 stars 103 forks source link

Query about including C in unknown context when calculating methylation proportions #703

Closed lahyusof closed 1 month ago

lahyusof commented 1 month ago

Hi Felix, I'm currently trying to calculate the proportions of CG, CHG and CHH methylation and am curious: should I factor in the C's in the Unknown context? The reason I'm asking this is because I'm planning to draw a pie chart showing the distribution the three sequence contexts, but research papers don't generally detail whether they account for unknown context.

Here's a summary output of my results from one of my mapping reports:

C methylated in CpG context: 3.1% C methylated in CHG context: 2.2% C methylated in CHH context: 0.7% C methylated in unknown context (CN or CHN): 1.2%

So, for example, total CG methylation would be 51.67% if I omitted the unknown C's but 43.056% if I included them. Looking forward to hearing your reply!

FelixKrueger commented 1 month ago

I would not include Unkown context, as it is most likely caused by insertions close to a cytosine, or other ambiguity in the reference genome. This context is also ignored for downstream extraction steps.