Closed yzyupup closed 1 year ago
Hi @yzyupup,
Thank you very much for your interest. Nanodisco
is a software designed for reliable and comprehensive methylation motif discovery from bacteria (and archaea) and microbiome, in which methylation events are diverse and highly motif driven.
From your post, I understand that you're interested in estimating methylation level at individual genomic positions. If this is the case, we do not recommend using the current version (v1.0.2) of Nanodisco
to attempt it. Our approach relies on aggregating the signal across mapped reads and across motif occurrences and therefore is not focused on read level methylation signal, as described in the Discussion of our paper.
If this is not the case and you want to perform the motif discovery, a typical analysis is achieved by running the following commands with the appropriate parameters: nanodisco preprocess
, nanodisco difference
, and nanodisco motif
.
Please let us know if this is helpful or if you have more questions.
Regards,
Alan
Hi, dear professor.I want to know if the "Sites" in the "meme. Heml" file obtained after executing the "nanodisco motif" step refers to the number of base pairs modified by methylation?
Does "sequence name" in the file "meme. Txtx" refer to the location where methylation modification occurs?
Sequence name Start P-value Site
1_4955936_fwd 4 8.03e-08 GAA ATAACCTGGTTAAA CCGCG
1_4201568_rev 3 2.53e-07 TC ATCGCCTGGTTGAA GCGCTC
1_218987_rev 3 1.50e-06 GA TCAACCTGGTCGAA ATAGGT
1_4497083_fwd 3 1.50e-06 TC TTAACCAGGTTGAT ACCTTC
Thank you very much!
Hi @xia1234567,
No, the number of sites correspond to the number of motif found in the top 2000 genomic regions at this step of the analysis. As the motif detection continue new regions (with weaker signal) are queried. Nanodisco was not developed for single site classification but for motif level analysis.
Hi, dear professor. I used the nanodisco to get the Combined current differences file. Now I want to find out which sites on the reference genome are methylated based on this file. what can i do next? For example, I want to know the probability of methylation occurring at 1000 positions on the reference genome. In addition, the two parameters "t_test_pval" and "u_test_pval" given in this file are smaller, the more it means modification? Because I found that they are not directly related to "mean_diff" Thank you very much!