Memory issue at index_genome

rotoke commented 3 years ago

Hi,

I'm trying to polish a 2.1 Gbp assembly (38735 contigs) with illumina reads (task 1212). The cluster I'm using has nodes with 32 CPUs and 12G RAM each, so a total of 384G RAM per node. Unfortunately, I keep on getting out of memory issues at the index_genome step even if I use parallel_jobs = 1 in the run.cfg file.

Do you have any suggestions on how to further tweak the RAM consumption? Or is there another issue I haven't seen? Would it be possible to run nextpolish on multiple nodes to increase the amount of available RAM?

Thank you and best regards, Roman

Error message

pidXXXXX.log.info

[INFO] 2021-09-28 22:19:31,143 logfile: pid67699.log.info
[WARNING] 2021-09-28 22:19:31,146 Re-write workdir
[INFO] 2021-09-28 22:19:34,732 scheduled tasks:
[1, 2, 1, 2]
[INFO] 2021-09-28 22:19:34,732 options:
[INFO] 2021-09-28 22:19:34,733 {'polish_options': '-p 32 -ploidy 2', 'rewrite': 1, 'job_prefix': 'nextPolish', 'job_type': 'slurm', 'hifi_minimap2_options': '-x map-pb', 'cluster_options': '-A partition --cpus-per-task={cpu} --mem-per-cpu={vf} --time=36:00:00', 'sgs_fofn': '[path to reads]', 'snp_valid': '[path to ./nextpolish_1212/%02d.snp_valid], 'sgs_rm_nread': 1, 'kmer_count': [path to ./nextpolish_1212/%02d.kmer_count], 'sgs_max_depth': '100', 'align_threads': '32', 'sgs_block_size': 500000000, 'lgs_max_read_len': '0', 'parallel_jobs': '1', 'multithread_jobs': '32', 'snp_phase': [path to ./nextpolish_1212/%02d.snp_phase], 'genome': [path to genome], 'lgs_read_type': '', 'genome_size': 2198407557L, 'workdir': [path to ./nextpolish_1212], 'cleantmp': 0, 'hifi_max_read_len': '0', 'hifi_block_size': '500M', 'hifi_min_read_len': '1k', 'sgs_align_options': 'bwa mem -p  -t 32', 'sgs_unpaired': '0', 'hifi_max_depth': '100', 'lgs_polish': [path to ./nextpolish_1212/%02d.lgs_polish], 'sgs_use_duplicate_reads': 0, 'score_chain': [path to./nextpolish_1212/%02d.score_chain], 'task': [1, 2, 1, 2], 'lgs_max_depth': '100', 'lgs_block_size': '500M', 'lgs_minimap2_options': '-x map-ont', 'rerun': 3, 'hifi_polish': [path to./nextpolish_1212/%02d.hifi_polish], 'lgs_min_read_len': '1k'}
[INFO] 2021-09-28 22:19:34,733 step 0 and task 1 start:
[INFO] 2021-09-28 22:19:34,742 skip step: db_split
[INFO] 2021-09-28 22:19:34,755 skip step: align_genome
[INFO] 2021-09-28 22:19:34,779 skip step: merge_bam
[INFO] 2021-09-28 22:19:34,982 skip step: polish_genome
[INFO] 2021-09-28 22:19:34,983 step 1 and task 2 start:
[INFO] 2021-09-28 22:19:34,988 skip step: merge_ref
[INFO] 2021-09-28 22:19:40,009 Total jobs: 1
[INFO] 2021-09-28 22:19:40,157 Submit jobID:[46974241] jobCmd:[[...]nextpolish_1212/01.kmer_count/02.index.ref.sh.work/index_genome0/nextPolish.sh] in the slurm_cycle.
[ERROR] 2021-09-29 00:59:47,500 index_genome failed: please check the following logs:
[ERROR] 2021-09-29 00:59:47,547 [[...]/nextpolish_1212/01.kmer_count/02.index.ref.sh.work/index_genome0/nextPolish.sh.e

log file

+ hostname
cd [...]/nextpolish_1212/01.kmer_count/02.index.ref.sh.work/index_genome0
+ cd [...]/nextpolish_1212/01.kmer_count/02.index.ref.sh.work/index_genome0
time [...]/NextPolish/bin/bwa index -p /[...]./nextpolish_1212/01.kmer_count/input.genome.fasta.sgs [...]./nextpolish_1212/01.kmer_count/input.genome.fasta
+ [...]/NextPolish/bin/bwa index -p [...]./nextpolish_1212/01.kmer_count/input.genome.fasta.sgs [...]./nextpolish_1212/01.kmer_count/input.genome.fasta
[bwa_index] Pack FASTA... 13.89 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=4364617092, availableWord=319110384
[BWTIncConstructFromPacked] 10 iterations done. 99999988 characters processed.
[...]
[BWTIncConstructFromPacked] 500 iterations done. 4356044436 characters processed.
[bwt_gen] Finished constructing BWT in 505 iterations.
[bwa_index] 1362.12 seconds elapse.
[bwa_index] Update BWT... [...]/nextpolish_1212/01.kmer_count/02.index.ref.sh.work/index_genome0/nextPolish.sh: line 5: 256575 Killed   [...]/NextPolish/bin/bwa index -p [...]./nextpolish_1212/01.kmer_count/input.genome.fasta.sgs [...]./nextpolish_1212/01.kmer_count/input.genome.fasta
slurmstepd: error: Detected 1 oom-kill event(s) in step 46974241.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Operating system

Distributor ID: Scientific
Description:    Scientific Linux release 7.9 (Nitrogen)
Release:    7.9
Codename:   Nitrogen

GCC

gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

Python

python 2.7

NextPolish

nextPolish v1.3.1

moold commented 3 years ago

Try to use your own alignment pipeline, and then only use NextPolish to polish the genome, see here

rotoke commented 3 years ago

Thank you very much for the quick reply. I tried using my own pipeline with bwa-mem2 as aligner and everything completed successfully.

Nextomics / NextPolish

Memory issue at index_genome #80