Closed SarahChen0401 closed 3 years ago
Exit code -9 suggests that the OS killed the job, most likely due to running out of memory. I would try splitting the job up into smaller chunks, or running on a machine with more memory available. I notice you are using "--identify m5C_TET" was the sample actually TET converted to 5-hydroxyl-methyl-C (5hmC), are is it being used to detect 5mC? For the latter it makes more sense to turn off identification and work with simple modified base calls, assuming you have enough coverage?
Exit code -9 suggests that the OS killed the job, most likely due to running out of memory. I would try splitting the job up into smaller chunks, or running on a machine with more memory available. I notice you are using "--identify m5C_TET" was the sample actually TET converted to 5-hydroxyl-methyl-C (5hmC), are is it being used to detect 5mC? For the latter it makes more sense to turn off identification and work with simple modified base calls, assuming you have enough coverage?
For your question: are is it("--identify m5C_TET" ) being used to detect 5mC? my answer is that i wanted to identify 5mC base, not 5hmC. But i don't know how to do that.
Listen to your advice("turn off identification and work with simple modified base calls"). the coverage of chr3 is about ~130X coverage. i use the default --identity option(which i found the default option is "m6A and m4C")
This comes from "ipdSummary --help":
--identify IDENTIFY Specific modifications to identify (comma-separated
list). Currrent options are m6A, m4C, m5C_TET. Using
--control overrides this option. (default: m6A,m4C)
Here is my command:
ipdSummary HG01109_subreads_sorted_aligned.bam
--reference hg38.fa
--referenceWindows chr3:1-10000000
--numWorkers 1
--gff basemods_wholeGen_modified_base_chr3.gff
--csv kinetics_wholeGen_modified_base_chr3.csv
Here is my basemods_wholeGen_modified_base_chr3.gff :
##gff-version 3
##source ipdSummary v2.0
##source-commandline ipdSummary HG01109_subreads_sorted_aligned.bam --reference hg38.fa --referenceWindows chr3:1-10000000 --numWorkers 1 --gff basemods_wholeGen_modified_base_chr3.gff --csv kinetics_wholeGen_modified_base_chr3.csv
##sequence-region chr3 1 198295559
chr3 kinModCall modified_base 10003 10003 39 - . coverage=433;context=GTTAGGGTTAGGGTTAGGGTTAGNNNNNNNNNNNNNNNNNN;IPDRatio=1.47
chr3 kinModCall modified_base 10004 10004 33 - . coverage=489;context=GGTTAGGGTTAGGGTTAGGGTTAGNNNNNNNNNNNNNNNNN;IPDRatio=1.40
chr3 kinModCall modified_base 10044 10044 22 + . coverage=163;context=CCTAACCCTAACCCTAACCCTAACCCTCACCCTACCCTAAC;IPDRatio=1.33
...
chr3 kinModCall m4C 10263 10263 30 + . coverage=209;context=ACCCTAACCCTAACCCTAACCCTCTAACCCTAACCCTCTAA;IPDRatio=1.46;identificationQv=5
chr3 kinModCall m4C 10334 10334 63 + . coverage=170;context=CCCTAACCCTAACCCTAACCCTCTGACCCTGACCCTGACCC;IPDRatio=1.91;identificationQv=18
chr3 kinModCall m4C 10336 10336 45 + . coverage=182;context=CTAACCCTAACCCTAACCCTCTGACCCTGACCCTGACCCTG;IPDRatio=1.61;identificationQv=7
...
chr3 kinModCall m6A 12038 12038 57 - . coverage=61;context=GATTCAGGAGAGGGGCAGCGAAGTGCTGAGTAGAGAAGGGC;IPDRatio=2.86;identificationQv=29
chr3 kinModCall m6A 13128 13128 35 - . coverage=58;context=CCACCCGCCTCGGCCTCCCAAGGTGCTGGGATTACAGGCTT;IPDRatio=1.87;identificationQv=10
chr3 kinModCall m6A 18558 18558 36 - . coverage=56;context=TATTGAAAGAAAGCCTTGCAAGAGAGGAGGCTCAAGAGCTT;IPDRatio=1.84;identificationQv=12
...
chr3 kinModCall modified_base 9999905 9999905 22 - . coverage=40;context=AACCAAGCAAGATTCAACTGTGTTTGGTGTTCATTTGCCTC;IPDRatio=1.83
chr3 kinModCall modified_base 9999922 9999922 32 + . coverage=43;context=AACACAGTTGAATCTTGCTTGGTTCTAAGACAGTGAGGAAA;IPDRatio=1.79
chr3 kinModCall modified_base 9999947 9999947 21 - . coverage=41;context=TGAATATATTTAAATACTGGGGAAATTTCCTCACTGTCTTA;IPDRatio=1.69
Here is my kinetics_wholeGen_modified_base_chr3.csv:
refName,tpl,strand,base,score,tMean,tErr,modelPrediction,ipdRatio,coverage
refName,tpl,strand,base,score,tMean,tErr,modelPrediction,ipdRatio,coverage
"chr3",10001,0,C,10,1.710,0.167,1.400,1.222,67
"chr3",10001,1,G,10,0.700,0.041,0.617,1.135,328
"chr3",10002,0,T,1,0.869,0.116,0.966,0.900,72
"chr3",10002,1,A,0,0.463,0.030,0.781,0.593,340
"chr3",10003,0,A,2,0.800,0.087,0.825,0.970,99
"chr3",10003,1,T,39,0.968,0.059,0.659,1.467,433
"chr3",10004,0,A,7,0.935,0.079,0.835,1.121,110
"chr3",10004,1,T,33,1.521,0.072,1.085,1.401,489
"chr3",10005,0,C,1,0.825,0.073,0.915,0.901,116
...
"chr3",10263,0,C,30,1.126,0.089,0.770,1.461,209
"chr3",10263,1,G,0,1.059,0.040,1.334,0.794,917
"chr3",10334,0,C,63,1.389,0.112,0.726,1.914,170
"chr3",10334,1,G,8,0.983,0.041,0.892,1.102,780
"chr3",10336,0,C,45,1.533,0.105,0.954,1.608,182
"chr3",10336,1,G,0,0.900,0.042,1.078,0.835,830
...
"chr3",12038,0,T,0,0.537,0.112,0.770,0.698,27
"chr3",12038,1,A,57,6.437,0.762,2.252,2.858,61
"chr3",13128,0,T,1,0.747,0.146,0.902,0.829,25
"chr3",13128,1,A,35,3.066,0.339,1.638,1.872,58
"chr3",18558,0,T,2,1.193,0.197,1.329,0.898,29
"chr3",18558,1,A,36,2.732,0.291,1.488,1.835,56
...
"chr3",9999999,1,G,7,1.064,0.182,0.888,1.198,39
"chr3",10000000,0,C,0,0.748,0.110,0.947,0.790,44
"chr3",10000000,1,G,6,1.063,0.209,0.916,1.160,40
Here is my question:
The pipeline will not identify m5C at base resolution. The modified base calls in the data indicate bases whose kinetics do not match the negative insilico control and are therefore likely modified in some way. m5C modified bases will most likely be assigned as "modified_base" but so will other modifications and the sensitivity is not known. For some recent work using a different pipeline to detect m5C from data such as this https://www.pnas.org/content/118/5/e2019768118.
I was try to call m5C from chr3 derived from hg38.fa, I run this command:
I got this error: