Open powerhorse1986 opened 2 years ago
Hi Li,
Thanks for your interest in DXM!
(1) These files relate to the transition probabilities and distance bins used in the modified HMM that models the short-range distance correlations in DNA methylation data. See the sections "Modified hidden markov model" and "HMM transition probabilities" for a deeper description of the HMM and how we trained the model in our paper. In practice we have found that the provided models work well across a range of samples as long as they show regional correlations like the samples we plotted in the paper, which is true for most human and mammalian samples. I suppose if you look at something extremely different like methylation in E.coli, or a cell line with severely depleted methylation these models might not work and need to be re-trained.
(2) I believe GSE66329 is the one with 31 samples from Pan et al. The other 4 in GSE130556 are from our group. We used both data sets in the paper, but for different analyses. It should be pretty clearly labelled when we use each in the text and figure captions, but if you have questions about particular analyses, let me know.
(3) For GSE66329 (the one with 31 samples) you have to massage the input a bit. You should be able to use a simple script to extract the chromosome, coordinates, methylation, and coverage into a bed-like file format and then convert it using the instructions here. We use position1 as the C in the CpG, and then set position2 = position1 + 1. I think the other trick is that you might consider collapsing the methylation values across strands since I think they report values for each strand.
John
Hi,
Our group is trying to use DXM to analyze some BS datasets. However, we found that there are some questions need to be clarified for successfully run the software. And here are the questions:
Thanks. Li