Closed RxLoutre closed 7 years ago
Try feed java more memory, e.g., 20G, by java -jar xxx.jar -Xmx20G
It gaves the same error even up to 100G of memory... Any others suggestions ?
That means you need more memory... I have 2.8G genome, 40x Illumina BAM. Had the same problem, solved with -Xmx160G.
I met the same error. The way i solved the error is by increasing the memery to a higher level, -Xmx120G. I think pilon is too consume the memery to run. Is there any improvement to solve it?
The primary use case for Pilon when it was written was for smaller genomes. I'm happy people have had as much success as they have with larger genomes. The time and space efficiency could be improved, but some things would need to be completely re-written and would run more slowly to minimize memory footprints. I'll keep this in mind.
How do I assign more memory when using a conda environment and submitting to slurm? If Pilon is not designed for large genomes, what other tools would you suggest?
push this to follow up @Lamm-a question
Processing a large genome without crashing a server Hi, In my opinion, pilon doesn't write an intermediate results ( results from chromosomes) into the output but stores everything in memory instead. Thus, memory usage grows as you progress through the genome. In my case ( 2.7 Gb genome, ~100x coverage) it eventually takes more than 300 GB RAM. As suggested above, one approach would be to split the genome into individual chromosomes and then run pilon on each one of them separately:
GENOMEFA=<path to the genome>
OUTDIR=pilon_out
PILONJAR=<path to pilon .jar>
JAVA_TOOL_OPTIONS="-Xmx200G -Xss2560k" # set maximum heap size to something reasonable
####### Split the genome
# split the genome into single chromsomes
bioawk -c fastx '{print $name}' $GENOMEFA > nms
mkdir -p split
samtools faidx $GENOMEFA
for CHR in $(cat nms);do
samtools faidx $GENOMEFA $CHR > split/$CHR\.fa;
done
# this produced split/ directory with individual fastas for each chromosome
####### Run pilon
mkdir -p $OUTDIR
# increase maximum java heap size, as the application crashes otherwise.
# Other option is to split the genome into chunks corresponding to single chromsomes
ls split/*fa > toprocess
for CHRFILE in $(cat toprocess); do
echo $CHRFILE
CHRNAME=$(basename $CHRFILE | cut -f 1 -d '.')
CHROUTDIR=$OUTDIR/$CHRNAME
java -jar $PILONJAR --nostrays --vcf --tracks --changes --genome $CHRFILE --output $CHRNAME --outdir $CHROUTDIR --fix snps --frags dnaseq/SRR7898210.sorted.bam --frags <your dnaseq data>
done
This will produce pilon_out directory with the files for each chromosome. You can then easily concatenate output files as following:
cat pilon_out/*/*fasta > whole_genome_pilon.fasta
Thanks for your suggestion!
At 2020-08-17 02:23:34, "Grygoriy Zolotarov" notifications@github.com wrote:
Hi, In my opinion, pilon doesn't write an intermediate results ( results from chromosomes) into the output but stores everything in memory instead. Thus, memory usage grows as you progress through the genome. In my case ( 2.7 Gb genome, ~100x coverage) it eventually takes more than 300 GB RAM. As suggested above, one approach would be to split the genome into individual chromosomes and then run pilon on each one of them separately:
GENOMEFA=
####### Split the genome
bioawk -c fastx '{print $name}' $GENOMEFA > nms mkdir -p split samtools faidx $GENOMEFA for CHR in $(cat nms);do samtools faidx $GENOMEFA $CHR > split/$CHR.fa; done
####### Run pilon mkdir -p $OUTDIR
ls split/*fa > toprocess
for CHRFILE in $(cat toprocess); do
echo $CHRFILE
CHRNAME=$(basename $CHRFILE | cut -f 1 -d '.')
CHROUTDIR=$OUTDIR/$CHRNAME
JAVA_TOOL_OPTIONS="-Xmx200G -Xss2560k"
java -jar $PILONJAR --nostrays --vcf --tracks --changes --genome $CHRFILE --output $CHRNAME --outdir $CHROUTDIR --fix snps --frags dnaseq/SRR7898210.sorted.bam --frags
This will produce pilon_out directory with the files for each chromosome. You can then easily concatenate output files as following:
cat pilon_out//fasta > whole_genome_pilon.fasta
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
@zolotarovgl Thank you! Just saved me a lot of work on our cluster when polishing 3gbp genomes with ~50X bams.
Hi, I have tried to use pilon using the following command :
java -jar pilon-1.21.jar --genome '/media/loutre/SUZUKII/assembly/merged/3-suzukii-polished-80-merged-renamed.fasta' --frags '/media/loutre/SUZUKII/annotation/evidences/rna/hisat/80x-illumina-suzukii-sorted.bam' --diploid --outdir '/media/loutre/SUZUKII/polishing' --output pilon80x-polishing-illumina --threads 32 --debug
And I got the following output :
The Illumina reads were aligned using Hisat2, then indexed and sorted using samtools
Did I made something wrong ?
Thanks for your help,
Roxane