WGLab / DeepMod

DeepMod: a deep-learning tool for genomic-scale, strand-sensitive and single-nucleotide based detection of DNA modifications
Other
97 stars 35 forks source link

The result of NA12878 and HX1 #35

Open Smart-zhi opened 4 years ago

Smart-zhi commented 4 years ago

Hello Liu: I've been working on analyzing CpG methylation in human genome. I tried to run DeepMod on NA12878 and HX1 nanopore data by myself, but I can't get the expected results. So is it convenient for you to send me the DeepMod results of NA12878 and HX1?

liuqianhn commented 4 years ago

@Smart-zhi Thanks for being interested in DeepMod. Your message is received. I am working on it. At the same time, it would be great if you can share your running commands and what you have done so that I might reproduce the results if there is any issue.

Smart-zhi commented 4 years ago

Thank you. Since the NA12878 data set is relatively large, I divided it into several groups and ran Albacore and DeepMod.py detect in sequence.

read_fast5_basecaller.py -i raw/ -r -t 20 -s na12878.${i}.albacore/ -f FLO-MIN106 -k SQK-LSK108 -o fast5 

python ${DeepMod}/bin/DeepMod.py detect --wrkBase na12878.${i}.albacore/workspace/pass --Ref ${ref} --FileID Notts_group1 --modfile ${DeepMod}/train_mod/rnn_conmodC_P100wd21_f7ne1u0_4/mod_train_conmodC_P100wd21_f3ne1u0 --threads 20 --outFolder ${out_folder}

And I get these in ${out_folder}

Bham_group0               Norwich_group1_1         UBC_group1_2
Bham_group0.done          Norwich_group1_1.done    UBC_group1_2.done
Bham_group1_1             Norwich_group2           UBC_group1_3
Bham_group1_1.done        Norwich_group2.done      UBC_group1_3.done
Bham_group1_2             Notts_group0             UBC_group1_4
Bham_group1_2.done        Notts_group0.done        UBC_group1_4.done
Bham_group1_3             Notts_group1             UBC_group2
Bham_group1_3.done        Notts_group1_1           UBC_group2.done
Bham_group2               Notts_group1_1.done      UCSC_group0
Bham_group2.done          Notts_group1.done        UCSC_group0.done
Norwich_group0            UBC_group1_1             
Norwich_group0.done       UBC_group1_1.done        

Then,

python ${DeepMod}/tools/sum_chr_mod.py ${out_folder}/ C na12878_C

I got

na12878_C.chr10.C.bed  na12878_C.chr14.C.bed  na12878_C.chr18.C.bed  na12878_C.chr21.C.bed  na12878_C.chr4.C.bed  na12878_C.chr8.C.bed  na12878_C.chrY.C.bed
na12878_C.chr11.C.bed  na12878_C.chr15.C.bed  na12878_C.chr19.C.bed  na12878_C.chr22.C.bed  na12878_C.chr5.C.bed  na12878_C.chr9.C.bed
na12878_C.chr12.C.bed  na12878_C.chr16.C.bed  na12878_C.chr1.C.bed   na12878_C.chr2.C.bed   na12878_C.chr6.C.bed  na12878_C.chrM.C.bed
na12878_C.chr13.C.bed  na12878_C.chr17.C.bed  na12878_C.chr20.C.bed  na12878_C.chr3.C.bed   na12878_C.chr7.C.bed  na12878_C.chrX.C.bed

I ran

python ${DeepMod}/tools/generate_motif_pos.py ${ref} ${genome_motif}/C C CG 0
python ${DeepMod}/tools/hm_cluster_predict.py ${out_folder}/na12878_C ${genome_motif}/C

and got:

na12878_C_clusterCpG.chr10.C.bed  na12878_C_clusterCpG.chr15.C.bed  na12878_C_clusterCpG.chr1.C.bed   na12878_C_clusterCpG.chr3.C.bed  na12878_C_clusterCpG.chr8.C.bed
na12878_C_clusterCpG.chr11.C.bed  na12878_C_clusterCpG.chr16.C.bed  na12878_C_clusterCpG.chr20.C.bed  na12878_C_clusterCpG.chr4.C.bed  na12878_C_clusterCpG.chr9.C.bed
na12878_C_clusterCpG.chr12.C.bed  na12878_C_clusterCpG.chr17.C.bed  na12878_C_clusterCpG.chr21.C.bed  na12878_C_clusterCpG.chr5.C.bed  na12878_C_clusterCpG.chrM.C.bed
na12878_C_clusterCpG.chr13.C.bed  na12878_C_clusterCpG.chr18.C.bed  na12878_C_clusterCpG.chr22.C.bed  na12878_C_clusterCpG.chr6.C.bed  na12878_C_clusterCpG.chrX.C.bed
na12878_C_clusterCpG.chr14.C.bed  na12878_C_clusterCpG.chr19.C.bed  na12878_C_clusterCpG.chr2.C.bed   na12878_C_clusterCpG.chr7.C.bed  na12878_C_clusterCpG.chrY.C.bed

During the analysis, I merged the na12878_C_clusterCpG.chr files to one file named total.bed (`cat na12878_C_clusterCpG.chr >total.bed`). Then use a simple shell script to merge the positive and negative chains of CpG into the same site. In the end I got a file similar to the following (location is based on 1):

chr_pos1    coverage    met rmet
chr17_19342304  10  2   0.2000
chr11_64368472  15  8   0.5333
chr9_70171213   17  1   0.0588
chr2_101126946  17  6   0.3529
chr7_92826868   40  8   0.2000
chr5_137781115  22  4   0.1818
chr17_4922691   15  1   0.0667
chr14_39170567  29  3   0.1034
chrX_32494303   24  2   0.0833
liuqianhn commented 3 years ago

@Smart-zhi Sorry for the late reply, since I want to update more because one of our lab members has been working on the whole evaluation process now. However, I do not have more results now, and I might have more updates later.

Right now, I checked all the positions you listed above, and found that the coverages you have for different positions are different from what I have. This might be due to the different versions of the basecalling of Nanopore data. My methylation percentage are thus significantly different from yours. I would like to share my DeepMod results with you, but it is several GB. Let me figure a way to share the results with you later. Thanks.

Smart-zhi commented 3 years ago

Thank you for your reply, Thank you very much for sharing data with me. I can receive data from any location such as goole drive, onedrive, baidu drieve, etc. Any way is ok. And my email address is zhang_zhi@csu.edu.cn.

Smart-zhi commented 3 years ago

@liuqianhn I used DeepMod to analyze CpG methylation on HX1 recently. I calculated the Pearson correlation coefficient between DeepMod / nanopolish and bisulfite result(Bismark). My result is as follows:

NA12878 The number of intersections(CpG) with Bismark Pearson correlation coefficient
nanopolish 26,733,082 0.9023
DeepMod 19,936,625 0.4325
HX1 The number of intersections(CpG) with Bismark Pearson correlation coefficient
nanopolish 27,303,077 0.9092
DeepMod 26,303,675 0.7708

I can't explain the performance of DeepMod on the NA12878 dataset. Could you please share me the results of NA12878 and HX1? I want to compare them to check these problems. Thank you.

liuqianhn commented 3 years ago

@Smart-zhi , Thanks for sharing your results. I will summarize the files and share them with you. According to your previous sharing on NA12878, there are significant differences caused by basecaller. No sure for HX1 data yet.

Smart-zhi commented 3 years ago

Hello @liuqianhn , I followed the instructions provided in Supplementary Table 5 to reproduce the chrY on HX1. The results are as follows:

Un-meth Meth Prec Rec
Supplementary Table 5 1,338 30,825 0.989 0.967
my test 1,320 34,729 0.994 0.967

Coverage>=3, threshold = 0.5


However, when I followed the instructions in Supplementary Table 4 to experiment on the chrX on NA12878, I couldn't get the same conclusion. In my results, precision  is close to Supplementary Table 4, but the recall is very low. I guess that some of the sites are lost due to the reduced coverage.

I urgently need the results on NA12878 so that my work can continue. If you have this part of the data and can share with me, I would be very grateful. Thank you.

liuqianhn commented 3 years ago

@Smart-zhi You are right, there might be coverage issue for newly basecalled NA12878. I am sorry that I do not have the data ready for you, because one of the lab members who partially worked on this left. I will try to finish my work in hand and prepare the data for you soon. May I know when is your deadline?

Smart-zhi commented 3 years ago

@liuqianhn Thank you, I hope to get the NA12878 result before August 31st. During this time, I plan to run DeepMod again, but I need to basecall before that. At the same time, I am very worried that the results are not satisfactory. I am very fortunate and honored that I can get your help.

liuqianhn commented 3 years ago

@Smart-zhi could you please try to see whether you can access the na12878 data from the link? I tested the performance for binary classification rather than correlation.

Smart-zhi commented 3 years ago

@liuqianhn , thank you very much. I have received the data. I find that chromosome 22 seems to be missing from the data.

In the next steps, I will test the classification effect. Again, I would like to express my warm thanks to you!

liuqianhn commented 3 years ago

Thanks for sharing, @Smart-zhi. It seems that I need to see how to improve deepmod for correlation testing. Thanks.

Pardeeskumar commented 2 years ago

I want to....gamil hacking tool