phyluce_workflow phasing issue: java.lang.OutOfMemoryError: Java heap space

lpnunez commented 3 years ago

Hello,

I'm having an issue using phyluce_workflow for phasing. The workflow will work fine until around 3/4 completion and it will get a java.lang.OutOfMemoryError like below:

Finished job 12. 12 of 17 steps (71%) done

[Wed Mar 17 20:09:27 2021] rule pilon_allele_0: input: /home/lnunez/nas5/UCE/spades_assemblies/contigs/Adelophis_foxi_LSUMZ_H8272.contigs.fasta, bams/Adelophis_foxi_LSUMZ_H8272.0.bam, bams/Adelophis_foxi_LSUMZ_H8272.0.bam.bai output: fastas/Adelophis_foxi_LSUMZ_H8272.0.fasta jobid: 14 wildcards: sample=Adelophis_foxi_LSUMZ_H8272

Select jobs to execute...

[Wed Mar 17 20:09:27 2021] rule pilon_allele_1: input: /home/lnunez/nas5/UCE/spades_assemblies/contigs/Adelophis_foxi_LSUMZ_H8272.contigs.fasta, bams/Adelophis_foxi_LSUMZ_H8272.1.bam, bams/Adelophis_foxi_LSUMZ_H8272.1.bam.bai output: fastas/Adelophis_foxi_LSUMZ_H8272.1.fasta jobid: 16 wildcards: sample=Adelophis_foxi_LSUMZ_H8272

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3745) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:538) at java.base/java.lang.StringBuilder.append(StringBuilder.java:174) at htsjdk.samtools.SAMTextHeaderCodec.advanceLine(SAMTextHeaderCodec.java:139) at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:94) at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:667) at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:298) at htsjdk.samtools.BAMFileReader.(BAMFileReader.java:176) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396) at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208) at org.broadinstitute.pilon.BamFile.reader(BamFile.scala:51) at org.broadinstitute.pilon.BamFile.process(BamFile.scala:116) at org.broadinstitute.pilon.GenomeRegion.processBam(GenomeRegion.scala:292) at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$5(GenomeFile.scala:112) at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$5$adapted(GenomeFile.scala:112) at org.broadinstitute.pilon.GenomeFile$$Lambda$48/0x00000001001a5840.apply(Unknown Source) at scala.collection.immutable.List.foreach(List.scala:388) at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$4(GenomeFile.scala:112) at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$4$adapted(GenomeFile.scala:109) at org.broadinstitute.pilon.GenomeFile$$Lambda$44/0x00000001001a0040.apply(Unknown Source) at scala.collection.Iterator.foreach(Iterator.scala:937) at scala.collection.Iterator.foreach$(Iterator.scala:937) at scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at scala.collection.parallel.ParIterableLike$Foreach.leaf(ParIterableLike.scala:970) at scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:49) at scala.collection.parallel.Task$$Lambda$45/0x00000001001a7840.apply$mcV$sp(Unknown Source) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:63) at scala.collection.parallel.Task.tryLeaf(Tasks.scala:52) at scala.collection.parallel.Task.tryLeaf$(Tasks.scala:46) at scala.collection.parallel.ParIterableLike$Foreach.tryLeaf(ParIterableLike.scala:967) [Wed Mar 17 20:18:19 2021] Error in rule pilon_allele_0: jobid: 14 output: fastas/Adelophis_foxi_LSUMZ_H8272.0.fasta shell: pilon --threads 1 --vcf --changes --fix snps,indels --minqual 10 --mindepth 5 --genome /home/lnunez/nas5/UCE/spades_assemblies/contigs/Adelophis_foxi_LSUMZ_H8272.contigs.fasta --bam bams/Adelophis_foxi_LSUMZ_H8272.0.bam --outdir fastas --output Adelophis_foxi_LSUMZ_H8272.0 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

After this error, the job will exit. It will still produce the bam files in the bam output directory, but not the fasta output.

Initially, I tried to resolve the issue by allocating more memory to the job and whittling down the number of samples, but I still get the same error consistently, so I feel like I've hit a wall. I'm running this job on an HPC cluster and the job is submitted through a PBS script.

Here is the log file in case it helps: phase_e.log

brantfaircloth commented 3 years ago

Technically, i am not sure this is a phyluce error. Pilon, which is doing the phasing, uses lots of RAM and seems to be getting too little of that RAM to work. You might try to run this on a node with lots of RAM, but tell phyluce to use only one core (versus 30). then, all of the RAM on the node should be dedicated to pilon when the code reaches that step. I’d try that with a single sample to see what happens. that said, you could still have a problem with too little RAM depending on your HPC setup (e.g. there still may be too little RAM per node).

lpnunez commented 3 years ago

Thank you for the quick response!

I did as you suggested, but unfortunately, the error persists. It occurs right when the Pilon jobs start as you said. For the record, the HBC cluster that I'm using has 256 GB per node. Here is my setup when I run a job script:

!/bin/bash

PBS -V

PBS -q batch

PBS -S /bin/bash

PBS -N Phase_Test

PBS -e /home/lnunez/nas5/UCE/Temp/phase_e

PBS -o /home/lnunez/nas5/UCE/Temp/phase_o

PBS -l nodes=1

PBS -l ncpus=56

PBS -l walltime=99:00:00

PBS -l mem=100GB

As you mentioned, this is probably a logistical issue on my end. Here is the log file for the test run: phase_test.log

brantfaircloth commented 3 years ago

You could try to run the pilon command on its own to see what happens (it will likely die, but can help diagnose). You may need the help of a sysadmin to further diagnose the issue.

It might also be reasonable to try to install pilon outside of phyluce, and see if this step will work with a different installation. I do know that the process works, because phyluce runs software tests against this and other programs, and those work ok. What I don't know is exactly what is causing the RAM allocation error on your system (and diagnosing that is almost impossible for me to do).

UPDATE: oops - sorry - you should be able to run

pilon --threads 1 --vcf --changes --fix snps,indels --minqual 10 --mindepth 5 --genome /home/lnunez/nas5/UCE/spades_assemblies/contigs/Adelophis_foxi_LSUMZ_H8263.contigs.fasta --bam bams/Adelophis_foxi_LSUMZ_H8263.0.bam --outdir fastas --output Adelophis_foxi_LSUMZ_H8263.0

outside of phyluce.

brantfaircloth commented 3 years ago

I actually see another thing that could cause a problem. See if you can try:

pilon -Xmx100g --threads 1 --vcf --changes --fix snps,indels --minqual 10 --mindepth 5 --genome /home/lnunez/nas5/UCE/spades_assemblies/contigs/Adelophis_foxi_LSUMZ_H8263.contigs.fasta --bam bams/Adelophis_foxi_LSUMZ_H8263.0.bam --outdir fastas --output Adelophis_foxi_LSUMZ_H8263.0

Alternatively, activate the phyluce environment, then type which pilon, then open that path with a text editor and edit the line:

default_jvm_mem_opts = ['-Xms512m', '-Xmx1g']

to read

default_jvm_mem_opts = ['-Xms512m', '-Xmx100g']

lpnunez commented 3 years ago

Ok, I changed the default_jvm_mem_opts in pilon and that seems to have fixed the issue. Thank you very much for the help!

brantfaircloth commented 3 years ago

Super. I'll see if I can add an easier way to configure this to phyluce.

luke-campillo commented 2 years ago

For what it's worth, I ran into the same issue and increasing the RAM allocated to pilon resolved the issue for me too. Thanks as always for clear solutions to our problems, Brant!

mkweskin commented 2 years ago

Instead of altering the pilon wrapper, we've been setting the Java memory options by setting the variable _JAVA_OPTIONS before we call phyluce_workflow. It seems to work 😸

export _JAVA_OPTIONS="-Xms1024m -Xmx55g"
phyluce_workflow --config config_file_phasing.conf \
    --output phasing_all \
    --workflow phasing

faircloth-lab / phyluce

phyluce_workflow phasing issue: java.lang.OutOfMemoryError: Java heap space #222

!/bin/bash

PBS -V

PBS -q batch

PBS -S /bin/bash

PBS -N Phase_Test

PBS -e /home/lnunez/nas5/UCE/Temp/phase_e

PBS -o /home/lnunez/nas5/UCE/Temp/phase_o

PBS -l nodes=1

PBS -l ncpus=56

PBS -l walltime=99:00:00

PBS -l mem=100GB