Closed Jerry-0591 closed 3 years ago
@zcWang0591 You need to download the bisulfite sequencing data for the evaluation. We have been working on the script for the evaluation with no hard-code paths. We hope we can upload it soon. The process is that: firstly, get the completely methylated CG (with methylation percentage >=90% in both replicates of bisulfite sequencing) and completely un-methylated CG (with methylation percentage 0% in both replicates), and then check them against the prediction with prediction methylation percentage. AP and AUC can be plotted by python tools.
Thank you for replying. Another question is about the E. coli dataset, do you have a download URL of it? I have emailed the original authors according to your guidance, but not gotten reply until now, which makes me hard to reproduce the result. So can you provide a download URL for it? Many thanks.
For the UMR and SSS E. Coli data, you can find where to download the data from https://www.nature.com/articles/nmeth.4184. For other E. coli data, you need to contact the developer of nanoraw. I wish I could have the URL for the data released by nanoraw, but the developer of nanoraw only has the right to share them.
Hi, It is a problem that nanoraw seems not have public dataset?(https://github.com/nanoporetech/tombo/issues/248). Do I miss something? can you give a brief guidance for download this part of data?
@zcWang0591 I see the issue. I wish I could provide you some download guidance, but there is no brief guidance for downloading, and you can only get the downloading information from the developer of the nanoraw.
Hi, if I only have the E. coli data from https://www.nature.com/articles/nmeth.4184, does the pipeline for the analysis of E. coli data keep the same? I have read the examples of pipeline(https://github.com/WGLab/DeepMod/blob/master/docs/Reproducibility.md), but it seems a guidance for the data of nanoraw? Can you write a brief guidance for the pipeline of the data which not produced by nanoraw? Because it is hard for me to get that part of data. Many thanks.
It is same if you use E. coli data from https://www.nature.com/articles/nmeth.4184: you can see UMR as negative control and SSS as positive control. But please note that some models are trained on UMR and SSS.
Thank you for replying, there is one thing that I want to confirm, SSS means the methylation introduced by M.SssI? I read the paper, but not find SSS.
Yes, SSS is short for the methylation introduced by M.Sssl. Thanks for pointing this out.
Thank you, I will try your suggestions
Closed due to no recent response. Feel free to reopen it if you need more help.
Hi Liu: I am working on the problem about 5mc/6ma base modification in human brain. I have tried the pipeline that you recommend on Na12878 dataset(https://github.com/WGLab/DeepMod/blob/master/docs/Reproducibility.md), but for the last step, I just see the guidance for data downloading, is there any scripts for doing the evaluation of Na12878 dataset?Can you write a brief guidance for it? Many thanks.