Some questions about methylation detection

yzyupup commented 3 years ago

Hi, dear professor. I used the nanodisco to get the Combined current differences file. Now I want to find out which sites on the reference genome are methylated based on this file. what can i do next? For example, I want to know the probability of methylation occurring at 1000 positions on the reference genome. In addition, the two parameters "t_test_pval" and "u_test_pval" given in this file are smaller, the more it means modification? Because I found that they are not directly related to "mean_diff" Thank you very much!

touala commented 3 years ago

Hi @yzyupup,

Thank you very much for your interest. Nanodisco is a software designed for reliable and comprehensive methylation motif discovery from bacteria (and archaea) and microbiome, in which methylation events are diverse and highly motif driven.

From your post, I understand that you're interested in estimating methylation level at individual genomic positions. If this is the case, we do not recommend using the current version (v1.0.2) of Nanodisco to attempt it. Our approach relies on aggregating the signal across mapped reads and across motif occurrences and therefore is not focused on read level methylation signal, as described in the Discussion of our paper.

If this is not the case and you want to perform the motif discovery, a typical analysis is achieved by running the following commands with the appropriate parameters: nanodisco preprocess, nanodisco difference, and nanodisco motif.

Please let us know if this is helpful or if you have more questions.

Regards,

Alan

xia1234567 commented 2 years ago

Hi, dear professor.I want to know if the "Sites" in the "meme. Heml" file obtained after executing the "nanodisco motif" step refers to the number of base pairs modified by methylation?

Does "sequence name" in the file "meme. Txtx" refer to the location where methylation modification occurs?

Motif NNVRCCWGGHNRNN MEME-1 sites sorted by position p-value

Sequence name Start P-value Site

1_4955936_fwd 4 8.03e-08 GAA ATAACCTGGTTAAA CCGCG
1_4201568_rev 3 2.53e-07 TC ATCGCCTGGTTGAA GCGCTC
1_218987_rev 3 1.50e-06 GA TCAACCTGGTCGAA ATAGGT
1_4497083_fwd 3 1.50e-06 TC TTAACCAGGTTGAT ACCTTC

Thank you very much!

touala commented 1 year ago

Hi @xia1234567,

No, the number of sites correspond to the number of motif found in the top 2000 genomic regions at this step of the analysis. As the motif detection continue new regions (with weaker signal) are queried. Nanodisco was not developed for single site classification but for motif level analysis.

fanglab / nanodisco

Some questions about methylation detection #22

Motif NNVRCCWGGHNRNN MEME-1 sites sorted by position p-value