WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Other
97 stars 35 forks source link

Why DeepMod program filter out sites with `methylated coverage==0` predicted by DeepMod? #49

Closed liuyangzzu closed 2 years ago

liuyangzzu commented 2 years ago

Dear Prof. Liu, Thank you for the great software for DNA methylation.

I found that your program will filter out sites that have no methylation-callings in the final results. See your code at (amod_dict[pk][1] is the methylated reads predicted by DeepMod): https://github.com/WGLab/DeepMod/blob/master/DeepMod_tools/sum_chr_mod.py#L57

In this case, all DeepMod outputs for final methylation results will lost too much CpG sites that have unmethylated predictions in all reads by DeepMod. In NA12878, there will lost CpG sites that DeepMod reports 0% methylation levels for performance evaluation and comparison with nanopolish. How do you ensure this kind of performance comparison in your paper is an objective/genome-wide evaluation? Thank you!

Below is your provided report of NA12878 on chr18, all your output results contains no lines with methylation coverage (last 2nd column)== 0:

head cpredecoli_org_clusterCpG.chr18.C.bed

chr18 10689 10690 C 8 +  10689 10690 0,0,0 8 25 2 68
chr18 10703 10704 C 8 +  10703 10704 0,0,0 8 100 8 91
chr18 10718 10719 C 8 +  10718 10719 0,0,0 8 75 6 89
chr18 10730 10731 C 8 +  10730 10731 0,0,0 8 25 2 95
chr18 10753 10754 C 8 +  10753 10754 0,0,0 8 100 8 94
chr18 10767 10768 C 2 +  10767 10768 0,0,0 2 100 2 94
chr18 10787 10788 C 8 +  10787 10788 0,0,0 8 25 2 87
chr18 10791 10792 C 8 +  10791 10792 0,0,0 8 75 6 92
chr18 10795 10796 C 8 +  10795 10796 0,0,0 8 75 6 94
chr18 10815 10816 C 8 +  10815 10816 0,0,0 8 100 8 98
=================================================^===
liuyangzzu commented 2 years ago

Below is the evidence that all of your NA12878 output results contains no 0% methylation level by DeepMod, there are too much CpGs you filtered for evaluations:

awk '$12 ==0' cpredecoli_org_clusterCpG.chr13.C.bed
awk '$12 ==0' cpredecoli_org_clusterCpG.chr14.C.bed
awk '$12 ==0' cpredecoli_org_clusterCpG.chr15.C.bed
umahsn commented 2 years ago

Dear Yang,

The performance in our paper was carried out on Ecoli and NA12878 reads basecalled with Metrichore and Albacore v1.* These basecallers produced events tables which were used to generate features for DeepMod. If you download NA12878 rel3 data from Nanopore WGS Consortium and use the basecalls present in the original FAST5 files you will be able to reproduce the same performance as in our paper.

We are working on a new DNA methylation tool DeepMod2 (https://github.com/WGLab/DeepMod2/) which works with Guppy FAST5 files. I will share our new tools performance with you shortly.

liuyangzzu commented 2 years ago

Thank you umahsn! Looking forward to DeepMod2.

Here, I just mean the potential bugs for DeepMod1 version. When combining batches, Qian's DeepMod program will filter out methylated coverage==0 (not total coverage==0)to report final outputs. In his program, if we supply all 5C reads, it will not generate any outputs even predictions are correct (unmethylated).

I also checked Albacore's DeepMod outputs on NA12878 provided on GitHub by Qian, which showed that your final outputs indeed contains no fully unmethylated predictions (reported in DeepMod paper).

Would you please find the right person who know DeepMod code, to confirm if it is a bug? Thank you! https://github.com/WGLab/DeepMod/blob/master/DeepMod_tools/sum_chr_mod.py#L57

kaichop commented 2 years ago

@liuqianhn can you please take a look at the comment using methylated coverage of zero in filtering results. It sounds like it should filter positions with total coverage of zero.

liuqianhn commented 2 years ago

@liuyangzzu This is the bug for filtering with coverage: "if amod_dict[pk][1]==0: del amod_dict[pk]", which should be "if amod_dict[pk][0]==0: del amod_dict[pk]". Meanwhile, except using DeepMod2, please test NA12878 with rel3.