Evaluation of Na12878 dataset - Githubissues

WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications

Other

97 stars 35 forks source link

Evaluation of Na12878 dataset #36

Closed Jerry-0591 closed 3 years ago

Jerry-0591 commented 3 years ago

Hi Liu: I am working on the problem about 5mc/6ma base modification in human brain. I have tried the pipeline that you recommend on Na12878 dataset(https://github.com/WGLab/DeepMod/blob/master/docs/Reproducibility.md), but for the last step, I just see the guidance for data downloading, is there any scripts for doing the evaluation of Na12878 dataset？Can you write a brief guidance for it? Many thanks.

liuqianhn commented 3 years ago

@zcWang0591 You need to download the bisulfite sequencing data for the evaluation. We have been working on the script for the evaluation with no hard-code paths. We hope we can upload it soon. The process is that: firstly, get the completely methylated CG (with methylation percentage >=90% in both replicates of bisulfite sequencing) and completely un-methylated CG (with methylation percentage 0% in both replicates), and then check them against the prediction with prediction methylation percentage. AP and AUC can be plotted by python tools.

Jerry-0591 commented 3 years ago

Thank you for replying. Another question is about the E. coli dataset, do you have a download URL of it? I have emailed the original authors according to your guidance, but not gotten reply until now, which makes me hard to reproduce the result. So can you provide a download URL for it? Many thanks.

liuqianhn commented 3 years ago

For the UMR and SSS E. Coli data, you can find where to download the data from https://www.nature.com/articles/nmeth.4184. For other E. coli data, you need to contact the developer of nanoraw. I wish I could have the URL for the data released by nanoraw, but the developer of nanoraw only has the right to share them.

Jerry-0591 commented 3 years ago

Hi, It is a problem that nanoraw seems not have public dataset?(https://github.com/nanoporetech/tombo/issues/248). Do I miss something? can you give a brief guidance for download this part of data?

liuqianhn commented 3 years ago

@zcWang0591 I see the issue. I wish I could provide you some download guidance, but there is no brief guidance for downloading, and you can only get the downloading information from the developer of the nanoraw.

Jerry-0591 commented 3 years ago

Hi, if I only have the E. coli data from https://www.nature.com/articles/nmeth.4184, does the pipeline for the analysis of E. coli data keep the same? I have read the examples of pipeline(https://github.com/WGLab/DeepMod/blob/master/docs/Reproducibility.md), but it seems a guidance for the data of nanoraw? Can you write a brief guidance for the pipeline of the data which not produced by nanoraw? Because it is hard for me to get that part of data. Many thanks.

liuqianhn commented 3 years ago

It is same if you use E. coli data from https://www.nature.com/articles/nmeth.4184: you can see UMR as negative control and SSS as positive control. But please note that some models are trained on UMR and SSS.

Jerry-0591 commented 3 years ago

Thank you for replying, there is one thing that I want to confirm, SSS means the methylation introduced by M.SssI? I read the paper, but not find SSS.

liuqianhn commented 3 years ago

Yes, SSS is short for the methylation introduced by M.Sssl. Thanks for pointing this out.

Jerry-0591 commented 3 years ago

Thank you, I will try your suggestions

liuqianhn commented 3 years ago

Closed due to no recent response. Feel free to reopen it if you need more help.