Open Smart-zhi opened 4 years ago
@Smart-zhi Thanks for being interested in DeepMod. Your message is received. I am working on it. At the same time, it would be great if you can share your running commands and what you have done so that I might reproduce the results if there is any issue.
Thank you.
Since the NA12878 data set is relatively large, I divided it into several groups and ran Albacore
and DeepMod.py detect
in sequence.
read_fast5_basecaller.py -i raw/ -r -t 20 -s na12878.${i}.albacore/ -f FLO-MIN106 -k SQK-LSK108 -o fast5
python ${DeepMod}/bin/DeepMod.py detect --wrkBase na12878.${i}.albacore/workspace/pass --Ref ${ref} --FileID Notts_group1 --modfile ${DeepMod}/train_mod/rnn_conmodC_P100wd21_f7ne1u0_4/mod_train_conmodC_P100wd21_f3ne1u0 --threads 20 --outFolder ${out_folder}
And I get these in ${out_folder}
Bham_group0 Norwich_group1_1 UBC_group1_2
Bham_group0.done Norwich_group1_1.done UBC_group1_2.done
Bham_group1_1 Norwich_group2 UBC_group1_3
Bham_group1_1.done Norwich_group2.done UBC_group1_3.done
Bham_group1_2 Notts_group0 UBC_group1_4
Bham_group1_2.done Notts_group0.done UBC_group1_4.done
Bham_group1_3 Notts_group1 UBC_group2
Bham_group1_3.done Notts_group1_1 UBC_group2.done
Bham_group2 Notts_group1_1.done UCSC_group0
Bham_group2.done Notts_group1.done UCSC_group0.done
Norwich_group0 UBC_group1_1
Norwich_group0.done UBC_group1_1.done
Then,
python ${DeepMod}/tools/sum_chr_mod.py ${out_folder}/ C na12878_C
I got
na12878_C.chr10.C.bed na12878_C.chr14.C.bed na12878_C.chr18.C.bed na12878_C.chr21.C.bed na12878_C.chr4.C.bed na12878_C.chr8.C.bed na12878_C.chrY.C.bed
na12878_C.chr11.C.bed na12878_C.chr15.C.bed na12878_C.chr19.C.bed na12878_C.chr22.C.bed na12878_C.chr5.C.bed na12878_C.chr9.C.bed
na12878_C.chr12.C.bed na12878_C.chr16.C.bed na12878_C.chr1.C.bed na12878_C.chr2.C.bed na12878_C.chr6.C.bed na12878_C.chrM.C.bed
na12878_C.chr13.C.bed na12878_C.chr17.C.bed na12878_C.chr20.C.bed na12878_C.chr3.C.bed na12878_C.chr7.C.bed na12878_C.chrX.C.bed
I ran
python ${DeepMod}/tools/generate_motif_pos.py ${ref} ${genome_motif}/C C CG 0
python ${DeepMod}/tools/hm_cluster_predict.py ${out_folder}/na12878_C ${genome_motif}/C
and got:
na12878_C_clusterCpG.chr10.C.bed na12878_C_clusterCpG.chr15.C.bed na12878_C_clusterCpG.chr1.C.bed na12878_C_clusterCpG.chr3.C.bed na12878_C_clusterCpG.chr8.C.bed
na12878_C_clusterCpG.chr11.C.bed na12878_C_clusterCpG.chr16.C.bed na12878_C_clusterCpG.chr20.C.bed na12878_C_clusterCpG.chr4.C.bed na12878_C_clusterCpG.chr9.C.bed
na12878_C_clusterCpG.chr12.C.bed na12878_C_clusterCpG.chr17.C.bed na12878_C_clusterCpG.chr21.C.bed na12878_C_clusterCpG.chr5.C.bed na12878_C_clusterCpG.chrM.C.bed
na12878_C_clusterCpG.chr13.C.bed na12878_C_clusterCpG.chr18.C.bed na12878_C_clusterCpG.chr22.C.bed na12878_C_clusterCpG.chr6.C.bed na12878_C_clusterCpG.chrX.C.bed
na12878_C_clusterCpG.chr14.C.bed na12878_C_clusterCpG.chr19.C.bed na12878_C_clusterCpG.chr2.C.bed na12878_C_clusterCpG.chr7.C.bed na12878_C_clusterCpG.chrY.C.bed
During the analysis, I merged the na12878_C_clusterCpG.chr files to one file named total.bed (`cat na12878_C_clusterCpG.chr >total.bed`). Then use a simple shell script to merge the positive and negative chains of CpG into the same site. In the end I got a file similar to the following (location is based on 1):
chr_pos1 coverage met rmet
chr17_19342304 10 2 0.2000
chr11_64368472 15 8 0.5333
chr9_70171213 17 1 0.0588
chr2_101126946 17 6 0.3529
chr7_92826868 40 8 0.2000
chr5_137781115 22 4 0.1818
chr17_4922691 15 1 0.0667
chr14_39170567 29 3 0.1034
chrX_32494303 24 2 0.0833
@Smart-zhi Sorry for the late reply, since I want to update more because one of our lab members has been working on the whole evaluation process now. However, I do not have more results now, and I might have more updates later.
Right now, I checked all the positions you listed above, and found that the coverages you have for different positions are different from what I have. This might be due to the different versions of the basecalling of Nanopore data. My methylation percentage are thus significantly different from yours. I would like to share my DeepMod results with you, but it is several GB. Let me figure a way to share the results with you later. Thanks.
Thank you for your reply,
Thank you very much for sharing data with me. I can receive data from any location such as goole drive, onedrive, baidu drieve, etc. Any way is ok. And my email address is zhang_zhi@csu.edu.cn
.
@liuqianhn I used DeepMod to analyze CpG methylation on HX1 recently. I calculated the Pearson correlation coefficient between DeepMod / nanopolish and bisulfite result(Bismark). My result is as follows:
NA12878 | The number of intersections(CpG) with Bismark | Pearson correlation coefficient | |
---|---|---|---|
nanopolish | 26,733,082 | 0.9023 | |
DeepMod | 19,936,625 | 0.4325 |
HX1 | The number of intersections(CpG) with Bismark | Pearson correlation coefficient | |
---|---|---|---|
nanopolish | 27,303,077 | 0.9092 | |
DeepMod | 26,303,675 | 0.7708 |
I can't explain the performance of DeepMod on the NA12878 dataset. Could you please share me the results of NA12878 and HX1? I want to compare them to check these problems. Thank you.
@Smart-zhi , Thanks for sharing your results. I will summarize the files and share them with you. According to your previous sharing on NA12878, there are significant differences caused by basecaller. No sure for HX1 data yet.
Hello @liuqianhn , I followed the instructions provided in Supplementary Table 5 to reproduce the chrY on HX1. The results are as follows:
Un-meth | Meth | Prec | Rec | |
---|---|---|---|---|
Supplementary Table 5 | 1,338 | 30,825 | 0.989 | 0.967 |
my test | 1,320 | 34,729 | 0.994 | 0.967 |
Coverage>=3, threshold = 0.5
However, when I followed the instructions in Supplementary Table 4 to experiment on the chrX on NA12878, I couldn't get the same conclusion. In my results, precision is close to Supplementary Table 4, but the recall is very low. I guess that some of the sites are lost due to the reduced coverage.
I urgently need the results on NA12878 so that my work can continue. If you have this part of the data and can share with me, I would be very grateful. Thank you.
@Smart-zhi You are right, there might be coverage issue for newly basecalled NA12878. I am sorry that I do not have the data ready for you, because one of the lab members who partially worked on this left. I will try to finish my work in hand and prepare the data for you soon. May I know when is your deadline?
@liuqianhn Thank you, I hope to get the NA12878 result before August 31st. During this time, I plan to run DeepMod again, but I need to basecall before that. At the same time, I am very worried that the results are not satisfactory. I am very fortunate and honored that I can get your help.
@Smart-zhi could you please try to see whether you can access the na12878 data from the link? I tested the performance for binary classification rather than correlation.
@liuqianhn , thank you very much. I have received the data. I find that chromosome 22 seems to be missing from the data.
In the next steps, I will test the classification effect. Again, I would like to express my warm thanks to you!
Thanks for sharing, @Smart-zhi. It seems that I need to see how to improve deepmod for correlation testing. Thanks.
I want to....gamil hacking tool
Hello Liu: I've been working on analyzing CpG methylation in human genome. I tried to run DeepMod on NA12878 and HX1 nanopore data by myself, but I can't get the expected results. So is it convenient for you to send me the DeepMod results of NA12878 and HX1?