PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

[ERROR] [root monitorChildProcesses 593] Child process exited with exitcode=-9. Aborting. #82

Closed SarahChen0401 closed 3 years ago

SarahChen0401 commented 3 years ago

I was try to call m5C from chr3 derived from hg38.fa, I run this command:

nohup ../ipdSummary HG01109_subreads_sorted_aligned.bam --reference hg38.fa \    
              --identify m5C_TET  \   
              --referenceWindows chr3:1-198295559 \     
              --numWorkers 5 \   
              --gff  basemods_wholeGen_m5C_chr3.gff \   
             --csv kinetics_wholeGen_m5C_chr3.csv  \  
             >nohup_wholeGen_m5C_chr3  2>&1 &

I got this error:

[INFO] 2021-01-20 09:22:46,331Z [root _summarizeReferenceRegion 226] Making summary: 1816985 to 1818015
[INFO] 2021-01-20 09:22:48,105Z [root _run 126] Got chunk: (1818, ReferenceWindow(refId=2, refName='chr3', start=1818000, end=1819000)) -- Process: <KineticWorkerProcess(KineticWorkerProcess-5, started daemon)>
[INFO] 2021-01-20 09:22:48,110Z [root _summarizeReferenceRegion 226] Making summary: 1817985 to 1819015
[INFO] 2021-01-20 09:22:49,485Z [root _run 126] Got chunk: (1819, ReferenceWindow(refId=2, refName='chr3', start=1819000, end=1820000)) -- Process: <KineticWorkerProcess(KineticWorkerProcess-3, started daemon)>
[INFO] 2021-01-20 09:22:49,494Z [root _summarizeReferenceRegion 226] Making summary: 1818985 to 1820015
[INFO] 2021-01-20 09:22:49,694Z [root _run 126] Got chunk: (1820, ReferenceWindow(refId=2, refName='chr3', start=1820000, end=1821000)) -- Process: <KineticWorkerProcess(KineticWorkerProcess-4, started daemon)>
[INFO] 2021-01-20 09:22:49,699Z [root _summarizeReferenceRegion 226] Making summary: 1819985 to 1821015
[INFO] 2021-01-20 09:22:49,843Z [root _fetchChunks 426] Retrieved 111 hits
[INFO] 2021-01-20 09:22:49,999Z [root _fetchChunks 426] Retrieved 106 hits
[INFO] 2021-01-20 09:22:51,373Z [root _fetchChunks 426] Retrieved 105 hits
[INFO] 2021-01-20 09:22:51,858Z [root _fetchChunks 426] Retrieved 108 hits
[INFO] 2021-01-20 09:22:56,379Z [root _run 126] Got chunk: (1821, ReferenceWindow(refId=2, refName='chr3', start=1821000, end=1822000)) -- Process: <KineticWorkerProcess(KineticWorkerProcess-3, started daemon)>
[INFO] 2021-01-20 09:22:56,384Z [root _summarizeReferenceRegion 226] Making summary: 1820985 to 1822015
[INFO] 2021-01-20 09:22:57,010Z [root _fetchChunks 426] Retrieved 104 hits
Child process exited with exitcode=-9.  Aborting.
[ERROR] 2021-01-20 09:22:57,439Z [root monitorChildProcesses 593] Child process exited with exitcode=-9.  Aborting.
[INFO] 2021-01-20 09:22:58,015Z [root _run 126] Got chunk: (1822, ReferenceWindow(refId=2, refName='chr3', start=1822000, end=1823000)) -- Process: <KineticWorkerProcess(KineticWorkerProcess-2, started daemon)>
[INFO] 2021-01-20 09:22:58,020Z [root _summarizeReferenceRegion 226] Making summary: 1821985 to 1823015
rhallPB commented 3 years ago

Exit code -9 suggests that the OS killed the job, most likely due to running out of memory. I would try splitting the job up into smaller chunks, or running on a machine with more memory available. I notice you are using "--identify m5C_TET" was the sample actually TET converted to 5-hydroxyl-methyl-C (5hmC), are is it being used to detect 5mC? For the latter it makes more sense to turn off identification and work with simple modified base calls, assuming you have enough coverage?

SarahChen0401 commented 3 years ago

Exit code -9 suggests that the OS killed the job, most likely due to running out of memory. I would try splitting the job up into smaller chunks, or running on a machine with more memory available. I notice you are using "--identify m5C_TET" was the sample actually TET converted to 5-hydroxyl-methyl-C (5hmC), are is it being used to detect 5mC? For the latter it makes more sense to turn off identification and work with simple modified base calls, assuming you have enough coverage?

For your question: are is it("--identify m5C_TET" ) being used to detect 5mC? my answer is that i wanted to identify 5mC base, not 5hmC. But i don't know how to do that.

Listen to your advice("turn off identification and work with simple modified base calls"). the coverage of chr3 is about ~130X coverage. i use the default --identity option(which i found the default option is "m6A and m4C")

This comes from "ipdSummary --help":

--identify IDENTIFY   Specific modifications to identify (comma-separated
                        list). Currrent options are m6A, m4C, m5C_TET. Using
                        --control overrides this option. (default: m6A,m4C)

Here is my command:

ipdSummary HG01109_subreads_sorted_aligned.bam 
--reference hg38.fa 
--referenceWindows chr3:1-10000000 
--numWorkers 1 
--gff basemods_wholeGen_modified_base_chr3.gff 
--csv kinetics_wholeGen_modified_base_chr3.csv

Here is my basemods_wholeGen_modified_base_chr3.gff :

##gff-version 3
##source ipdSummary v2.0
##source-commandline ipdSummary HG01109_subreads_sorted_aligned.bam --reference hg38.fa --referenceWindows chr3:1-10000000 --numWorkers 1 --gff basemods_wholeGen_modified_base_chr3.gff --csv kinetics_wholeGen_modified_base_chr3.csv
##sequence-region chr3 1 198295559
chr3    kinModCall      modified_base   10003   10003   39      -       .       coverage=433;context=GTTAGGGTTAGGGTTAGGGTTAGNNNNNNNNNNNNNNNNNN;IPDRatio=1.47
chr3    kinModCall      modified_base   10004   10004   33      -       .       coverage=489;context=GGTTAGGGTTAGGGTTAGGGTTAGNNNNNNNNNNNNNNNNN;IPDRatio=1.40
chr3    kinModCall      modified_base   10044   10044   22      +       .       coverage=163;context=CCTAACCCTAACCCTAACCCTAACCCTCACCCTACCCTAAC;IPDRatio=1.33
...
chr3    kinModCall      m4C     10263   10263   30      +       .       coverage=209;context=ACCCTAACCCTAACCCTAACCCTCTAACCCTAACCCTCTAA;IPDRatio=1.46;identificationQv=5
chr3    kinModCall      m4C     10334   10334   63      +       .       coverage=170;context=CCCTAACCCTAACCCTAACCCTCTGACCCTGACCCTGACCC;IPDRatio=1.91;identificationQv=18
chr3    kinModCall      m4C     10336   10336   45      +       .       coverage=182;context=CTAACCCTAACCCTAACCCTCTGACCCTGACCCTGACCCTG;IPDRatio=1.61;identificationQv=7
...
chr3    kinModCall      m6A     12038   12038   57      -       .       coverage=61;context=GATTCAGGAGAGGGGCAGCGAAGTGCTGAGTAGAGAAGGGC;IPDRatio=2.86;identificationQv=29
chr3    kinModCall      m6A     13128   13128   35      -       .       coverage=58;context=CCACCCGCCTCGGCCTCCCAAGGTGCTGGGATTACAGGCTT;IPDRatio=1.87;identificationQv=10
chr3    kinModCall      m6A     18558   18558   36      -       .       coverage=56;context=TATTGAAAGAAAGCCTTGCAAGAGAGGAGGCTCAAGAGCTT;IPDRatio=1.84;identificationQv=12
...
chr3    kinModCall      modified_base   9999905 9999905 22      -       .       coverage=40;context=AACCAAGCAAGATTCAACTGTGTTTGGTGTTCATTTGCCTC;IPDRatio=1.83
chr3    kinModCall      modified_base   9999922 9999922 32      +       .       coverage=43;context=AACACAGTTGAATCTTGCTTGGTTCTAAGACAGTGAGGAAA;IPDRatio=1.79
chr3    kinModCall      modified_base   9999947 9999947 21      -       .       coverage=41;context=TGAATATATTTAAATACTGGGGAAATTTCCTCACTGTCTTA;IPDRatio=1.69

Here is my kinetics_wholeGen_modified_base_chr3.csv:

refName,tpl,strand,base,score,tMean,tErr,modelPrediction,ipdRatio,coverage
refName,tpl,strand,base,score,tMean,tErr,modelPrediction,ipdRatio,coverage
"chr3",10001,0,C,10,1.710,0.167,1.400,1.222,67
"chr3",10001,1,G,10,0.700,0.041,0.617,1.135,328
"chr3",10002,0,T,1,0.869,0.116,0.966,0.900,72
"chr3",10002,1,A,0,0.463,0.030,0.781,0.593,340
"chr3",10003,0,A,2,0.800,0.087,0.825,0.970,99
"chr3",10003,1,T,39,0.968,0.059,0.659,1.467,433
"chr3",10004,0,A,7,0.935,0.079,0.835,1.121,110
"chr3",10004,1,T,33,1.521,0.072,1.085,1.401,489
"chr3",10005,0,C,1,0.825,0.073,0.915,0.901,116
...
"chr3",10263,0,C,30,1.126,0.089,0.770,1.461,209
"chr3",10263,1,G,0,1.059,0.040,1.334,0.794,917
"chr3",10334,0,C,63,1.389,0.112,0.726,1.914,170
"chr3",10334,1,G,8,0.983,0.041,0.892,1.102,780
"chr3",10336,0,C,45,1.533,0.105,0.954,1.608,182
"chr3",10336,1,G,0,0.900,0.042,1.078,0.835,830
...
"chr3",12038,0,T,0,0.537,0.112,0.770,0.698,27
"chr3",12038,1,A,57,6.437,0.762,2.252,2.858,61
"chr3",13128,0,T,1,0.747,0.146,0.902,0.829,25
"chr3",13128,1,A,35,3.066,0.339,1.638,1.872,58
"chr3",18558,0,T,2,1.193,0.197,1.329,0.898,29
"chr3",18558,1,A,36,2.732,0.291,1.488,1.835,56
...
"chr3",9999999,1,G,7,1.064,0.182,0.888,1.198,39
"chr3",10000000,0,C,0,0.748,0.110,0.947,0.790,44
"chr3",10000000,1,G,6,1.063,0.209,0.916,1.160,40

Here is my question:

  1. What does the "modified_base" mean? Does this mean m5C?
  2. if "modified_base" doesn't mean m5C, how can i get m5C?
rhallPB commented 3 years ago

The pipeline will not identify m5C at base resolution. The modified base calls in the data indicate bases whose kinetics do not match the negative insilico control and are therefore likely modified in some way. m5C modified bases will most likely be assigned as "modified_base" but so will other modifications and the sensitivity is not known. For some recent work using a different pipeline to detect m5C from data such as this https://www.pnas.org/content/118/5/e2019768118.