broadinstitute / pilon

Pilon is an automated genome assembly improvement and variant detection tool
GNU General Public License v2.0
340 stars 60 forks source link

Crash due to GC overhead limit exceed #32

Closed gucascau closed 7 years ago

gucascau commented 7 years ago

I try to polish an assembled genome with the size of 800Mb, and 30X paired-end reads and 5X mate pair reads under a server of 515881MB and 70 CPUs. However, I faced with the problem after several hours running: Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.broadinstitute.pilon.BaseSum.(BaseSum.scala:23) at org.broadinstitute.pilon.PileUp.(PileUp.scala:25) My command line is following: java -Xmx90g -jar /home/xinw/software/pilon/pilon-1.21.jar --genome Brain_assemble.fasta --frags Braincoral_paired_400.sort.bam --jumps Braincoral_MP_6000.sort.bam --output Brain_polish.fasta --changes --vcf --diploid

Any suggestion? Thanks !

RxLoutre commented 7 years ago

I actually had the same issue... Did you managed to find a solution ?

gucascau commented 7 years ago

Finally, I did not adopt Pilon due to the same problems. I mainly corrected the variations based on the mpileup file.

nathanhaigh commented 7 years ago

This essentially means your 90G of allocated memory was insufficient. When Java starts getting towards the limit of the memory specified (-Xmx) it starts to perform more aggressive garbage collection (GC) to free up memory. So as to avoid running GC all the time, instead of running computations, Java will kill the command if the total time spent performing GC exceeds a percentage (I think 98% from memory) of total run time. I reality, I think what this means is that if Pilon is running for 2min before it starts to run GC, then the GC would have to run for 98min before the process is killed.

The only general Java solution is to increase the amoutn of memory you make available to Java by increasing -Xmx.

w1bw commented 7 years ago

In the future, you should be able to do better by splitting your input genome, either by splitting the fasta or by using --targets to only process portions at a time. Also, if you are only interested in improving base accuracy and small indels, then "--fix bases" will use a much smaller memory footprint, since it doesn't need to do local reassemblies.

nick-youngblut commented 6 years ago

It would be really helpful to included -Xmx in the pilon script. It appears that that -Xmx is hard coded in the pilon script:

default_jvm_mem_opts = ['-Xms512m', '-Xmx1g']

This makes it hard to change -Xmx when installing pilon from bioconda, since on the pilon script is add to PATH and not the jar file. So if the user wants to change -Xmx, they have to find the jar file in the conda install directory and call the jar file directly. This could be avoided simply by adding an option in the pilon script that adjusts -Xmx.

I'm using pilon verson Pilon version 1.22 Wed Mar 15 16:38:30 2017 -0400 installed from bioconda.

jwasmuth commented 6 years ago

@nick-youngblut,

I agree that it would be nicer to have a command line option. However, for conda users, like you and me, you can use which pilon to find the executable.

sunnycqcn commented 6 years ago

I met the same issue. I change -xmx to 800G, it is not still work. My server is 1T RAM with 20 cpus. My genome is 1.6G with 70x PE reads and 100x MP reads. Have you any suggestions? Thanks, Fuyou

Nifaste commented 6 years ago

@sunnycqcn did you fixed the problem ? I met the same issue ... Running pilon in unicyler from bioconda

sunnycqcn commented 6 years ago

I did not.

sunnycqcn commented 6 years ago

@Nifaste I did not. I am trying to looking for other methods.